BEST AVAILABLE COPY 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property 
Organization 

International Bureau 

(43) International Publication Date 
3 Marcfr 2005 (0*633005) 




(10) International Publication Number 

PCT WO 2005/020125 M 



(51) IntemaUonal Patent aassfficaQon 7 : G06F 19/00 



(21) International Application Number: 

PCTAJS2004y027022. 



;(22) :Inte™UonaiFlllhg}Date: 20 "Au^f :20ft|R20XMU004) 

■j^ Opiu^iim , English 

(26) Publication English 
(30) Priority Data: 



6<V496,657 



aO August 2003:(20.08 ! 2003) US 



— (ir\ Applicant (for all designated -Stales except USy.Mtir 

=5.:., A;fc>tiM^ 

Sgfe Waltham. MA- 0245 1: (US), 



;{ii):i)e£j^i& indicated, for every? 

kind of national protection available): AE» AG, AL, AM» 
AT, AU, AZ, BA.BB, BG;BJUBW, BY^, CA,CH, CN,, 

GB, GD, GE,. GH; GM, HR, HU, ID,, IL, IN t *IS, JP, KE t . 
KG, KP, KR, KZ, LC, LK; LR t <LS, LT, LU; LYMA/'MD, 
MG, MK, MN. MW, MX.ilZ; NA, NI; NO, NZ, OM,PG, 
PH,PUPi;R0, : RU,sd,^ 

TN, TR, TTi TZ,iuAVUG^US, UZ^C;m ; m ZA, 2M[ 

zw. : • t • • 

(84) Designated States (unieistotherw for every 

• i lkind^ . ARIFO (BW..GH, 

G^l^I^ 

; ^rEu^ianXAM, AZ, B Y/KG, KZ, *fD, : RU, TJi TM), 
European (AT, BE, BG, CH, C Y, CZ, DE, DK,KE, ES, H, 
FRX3B , GR, TIU, TF^ IT, LIT, MQ NL, PL, PT, RO, SE; SI, 
SK, TR), OAPI (BF, BJ, GF, CG, Gl, CM,' GA, GN, GQ; 



|;:(72).lnyehtor;jand. . Published: 

»V(7^\Ihvi»tor/Appllcanf. (for US only)v CLISHf Clary . — (witiiput inte^ 
\ : ^ upon receipt 

i -(74) Agent: BRODOWSKI, Michael, - tt; Testa, >Hurwitz For two-letter codes and otjier Mreviitiaiis, refer to tlie "Gu& 

i . f& fiiibeaujt,;!^ High Street Tower;; 125 ifigh Street, once Notes on CodexandAbbrm 



Bbstdn/MA 02110 (US). - 



ningtfeacliK 



,g. . 

i?5 ; \ \ — : 

® (54) Title: METHODS AND SYSTEMS FOR PROFILING BIOLOGICAL SYSTEMS 

"' ' 

:Q (57) Abstract: Methods and systems are disclosed for developing profiles of a- state of a biological system based on the discern^ 
meat of similarities, differences, and/or correlations between a plurality of.data sets that are derived from one or more bipmolecular 
component types, one or more biological sample types, and/or one or more types of measurements. 



WO 2005/020125 



PCT/US2004/027022 



Methods and Systems for Profiling Biological Systems 

This application claims priority to and the benefit of U.S. Provisional Patent Application 
Serial No. 60/496,657, filed on August 20, 2003, the entire disclosure of which is incorporated 
by reference herein. 

Fiel d of the Invention 

5 The invention relates to the field of data processing and evaluation. More particularly, 

the invention relates to melhodajand^stems for profiling a ^ , 
mammdsu^ 

> 

Background 

Chirrent approaches to understanding biology, such as genomics and proteomics, typically 

10 focus on a single aspect of a hiblo^cal system at any one time. The a omics" technology 

revolution, particularly that of genomics, has provided a basis for studies of a single type of v 
biomoleculeboth in single cell organisins,x.&, yeast, md in simple, multi-cellular systems, such 
as sea urchin embryos. In both types of studies, the systems are perturbed by environmental 
changes and/or genetic manipulation to enable the correlation of gene expression changes in a 

15 number of different scenarios. Construction of in silico interaction networks is facilitated by 
looking at interdependences between and among genes from several different perspectives. 
However, while modern quantitative genomic technologies are readily available, the resulting 
information may be of low precision andutility. For example, in one sea urchin study, a 
perturbation was deemed significant oniy if it gave rise to a three-fold or greater change in gene 

20 expression. Although a number of experimental fectors might contribute to the net variability in 
a system and reduce precision, a significant biological effect may be manifested by a change that 
occurs well under a three-fold cut-off. 

Analyzing and understanding a complex, multi-cellular organism, such as a mammal, is 
much more complicated. When studying the state of a complex biological system, one must take 

25 into account the multi-compartmental character of the system, not to mention the variety of cell 
and tissue types that will have unique gene expression and protein and metabolite levels. 
Current studies that rely on the analysis of a single aspect of a biological system, e.g., a single 



WO 2005/020125 



2 



PCT/US2004/027022 



type of molecule or target, usually are not robust enough to understeiidtiie entirebiological 
system or subsystem that may be involved in a p^cularmole^ of disease. 

An important challenge in the understanding of abiolpgical system of a mammal and the 
development ofnewdnigs for complex, multi-factorial diseases is the identification and 
» 5 \^dation ofbipm^kers/su3TOgate maricers.; Moreover, it appears thsit instead of single 

biomarkers teiQg indicative of a state of a biological sy^m, blomarker patterns or biomarker 
sets may be necessary to characteri^ and d^ 

system, ^ ^OT mult^lb leyels of the biologi^system are ^ultaneoudy. considered*in the 
analysis. Accordingly, there is a need for methods and systems that consider a biological system 
10 as jrtjf^ — 
dewlbpmeift^ 

Sumxnarv-ofthe Invention 

The appHcanfs of "systems 
;bidlpgyi* to 

15 is the s&dy of biology asan integrated Mbipgi^i system including genetic, protein and 
n^ 

artifici^^pli^g>the inherent complexity ofbiolog^ 

ofato^l^oig^ 
d^l^SpSnsi^tto 

20 ihterdep^dencies contained w By appropriately 

conSide^g the complexity of a biological system a skilled artisan can undertake biological 
researchM the systems level, developing;^ system which 

providfes ;insight into the biological system as a whole. 

lie application describes methods and systems to analyze compliex clinical^sainples of 

25 mammals including humans at a biological systems level to provide hew information about the 
state of a biological system that was previously unobtainable through traditional chemistries or 
genomics alone. Using the methods and systems described herein, it is possible to gain insight 
irito^biplpgfc^ mechanisms of disease and drug response. More specifically, the 

methods and systems can analyze and integrate dataat the biomolecular component type level r 

30 L6„ the gene/gene transcript, protein and metabolite level* to create knowledge that advances 
pharmaceutical research and development by providing new insights into the molecular 
mechanisms of health and disease, which further the development and discovery of novel 
therapeutics to treat human disease. 



WO 2005/020125 



3 



PCTAJS2004/027022 



To develop a profile of a state of a biological system, eig., a disease state, multiple 
measurements on complex biological samples are perfonnedv Subsequently, comprehensive 
gene, gene transcript, protein, and/or metabolite pfofiMg coupled wilh coirelation analysis and. 
network modeling provides insight into a biological system at a systems level so that 
5 connections, correlations, and relationships among thousands of diverse, measurable molecular 
components can be achieved. Such knowledge then may be used directly for the development of 
therapeutic agents or biotnaikers, may be used in combination with clinical information,<and/or 
may serve as abasis for directed, hypo&esis-driven experiments designed to fu^er elucidate 
pathophysiologic mechanisms. Further, tracldng changes of a pro&e of a biological system can 
[ 0 improve many aspects of ftonaceutical discovery and deydopmentjtacl^ 
efficacy^ drug response, and the etiology of disease. 

The application addresses linufcrtions in e^ techniques by providing a 

method Md sy^m, or a technology platfpnn," M ability to integrate a plurality of data 

sete, which may include two or more biomolecular component types, to elucidate ihfonnatioh 
conveying associations between or among components or network of interactions among 
components. The methods and systems utilize statistical analyses of a plurality of data sets,e.g., 
spectrometric data, to develop aproffle of a state of abiological system, e.g., a manamal such as 
abuman. The data sets comprise multiple measurements of the biological system and are 
derived from three primary sources: abioiogical sample type, ^measurement technique, and a 
biomolecular component type. The application iurfher describes a technology platform that 
facilitates the discernment of similarities, differences, and/or correlations not only within a single 
>iomblecular component type within a sample or biological system, but also across two or more 
biomolecular component types. 

In a broad ^pect, a method of profiling a state of abioiogical system includes evaluating 
25 with statistical analysis a plurality of data sets of a biological system and comparing features 
among the plurality of data sets to determine one or more sets of differences among at least 
portion of the plurality of data sets. The action of comparing the features among the plurality of 
data sets can include direct comparison of one feature in a first data set to a corresponding 
feature in another data set The action of comparing the features also can include correlating or 
30 associating features between or among data sets such as correlations associated with and/or 
resulting from the statistical analysis, e.g., multivariate analysis. Based on the results of the 
evaluation and comparison, a profile for a state of the biological system can be developed. 



WO. 2005/020125 , PCT/US2004/027022 . 

4. 

Anqthrc method of profiling a state of a biological system in a mammd includes 
evaluating wth statistical analysis a jiluMly of data s 

comparihgfeatures among the plurality of 'data sets to determM&diie or naote sets of differencies 
among at 1^ a p^ of idata seCs; : evaluaiing with statistical analysis^ 
.5 ghratiiytf among 
the.pliirality of data sets to determine one of more sets of difference: among at least' a portion of 
lhe|iKi^ 

profile for a state of tiie biological system-, 

A fiirther metho&of profiling a of a tiplogical system in a mammal includes : 
v 10 e&duatihgw^ 

twpbiomolecular ^o^ tKe>j^ o£ data sets to 

d^temuneroceor 

and:deyelppmgaprp&e^ 

described analysts. 

• .: r get^V Thepli^^ • - .. 

con^ 

techhi^ii^ 
20 maimmal, such as a humane, Abiomoiecu^^ 

•genera gene transcript^ a metabolite. 

AMoiogical sample type includes, among ptiiere, blood, plasma, serum, cerebrospinal 

fluid, hfife, saliva 

na^fiiili ocular fluid v m 
25 cells, endothelial cells, kidney cells, prostate cells, blood cells, lung eeUs, brain cells, skin cells, 

adippse>cells, tumor cells, and mamm^^ 

biological sample type that is treated differently* or from one biological sample type that is 
:cdllected or analyzed at diff^ent t^ 

A m^s^meat technique iricludeSi among otheK, liquid chromatography, gas 
30 chromat6^phy,'M^perfo 

spectr onietry, liquid chfomatographyrmass spectrometry, gas chromatograjphy-mass 
spectrometry, high performance Hiquid chromatogr^^ 

electrophoresis-mass spectrometry, nuclear magnetic resonance;spectrometry, parallel 



WO 2005/020125 



5 



PCTAJS2004/027022' 



hybridization assay, parallel sandwich assay, and competitive assay. Data sets can include 
measurements from different in^^ configurations of a single type of measurement 
technique. 

Subsequent to developing a . profile for Uie state of a biological system, the profile can be 
5 compared to a profile of anotestate of a biological systeni, where the biological systems are 
foe same or different A profile also can be compared to a database of profiles to evaluate 
whether the state of ^ biologic^ system matches or is similar to a kno wn state. The methods 
described herein may be carried out by an article of manu&cture having a computer^eadable 
medium with computeMeadable instructions embodied thereon for perfonningihe methods, 
idad^^esof^ 

,tk>a, and claims, all of which illustrate the principles of the invention by 
way of example only. 

. Brief Description of the Figures 

The foregoing and other objects, features, and advantages of the invention described 
15 above will be more fully understood from the following description of various illustrative 
f embodime^ Inthe dmwings, like 

reference characters generally refer td the same parts throughout the different views. The 
drawings are not necessarily to scale, and emphasis instead is generally placed upon illustrating 
the principles of the invention . 
20 Figure 1 is a schematic flow diagram illustrating the integration of genomic, proteomic, 

metabolomic and clinical data sets to develop a proffle of a biological system. 

Figure 2 is a flow diagram of various analytical and processing steps as applied to a 
plurality of data sets according to an illustrative embodiment of the invention. 

Figure 3 illustrates the experimental design of the ApoE3-Leiden transgenic mouse gene 

25 expression experiment. 

Figure 4 illustrates a significance plot for the gene expression experiment 
Figure 5 illustrates a significance plot for the selected 1059 peptide peaks from four liver 
fractions* 

Figure 6 illustrates a block design for the synthetic data GIST experiment 
3 0 Figure 7 illustrates scatter plots and a normal probability plot for variety 1 of the 

synthetic GIST date set 

Figure 8 illustrates scatter plots and a normal probability plot for variety 2 of the 

synthetic GIST data set 



WO 2005/0201 25 PCT/US2004/027022 

6 

Figure 9 illustrates scatter plots and a normal probability plot for variety 3 of the 
synthetic GIST data set 

Figure 10 iUustrates a significance plot for the synthetic GIST data.set 

Figure 11 fflusfinfestt the treatment of the gene expression 

5 data derived a biological sample. 

Figure 12 iUustrates a flow .:dkg^ihA^des&l^^ ^ treatment of the protein data 
derived from : a biolo^calsainple. 

Figure 13 iUustrates a.flow diagi^^t<desOTbes the treatment of the metabolite data 
derived &bm a biological sampte. 
flO: Ffpirerl4 illustrates a flow diagram that describes foe^i^ 

-setsderi^ 

Figure; 15 ilhisteates a gene expression analysis t^ 

Figure 1 6 jliustrates rc^te for select gipups^m^ analysis, 
Figurei l? illustrates results for selected groupsfiom a gene ejqjression analysis. 
15; Figuis l & M^ total ion chromatograms of proteir^s from . 

plasma samples- 

Figure l?iUra1ra^ profiHng of protedns from 

iplasmasainples. 

Figure 20- iHu^ 
20 fivetr^genic 

Figure 21 iUustrates *H NMR spectra of metabolites extracted fromplasmafrom 
transgenic and wUdtype mice. 

Figure^iMustrates mass chromatograms of plasma lipids recorded using LC/MS for 
transgenic and wildtype mice. 
25 Figure 23 iUustrates individual gene, protein, andtmetaboUte spectra that are normalized 

and then concatenated to form a single factor spectrum for comparison across individual 
biomolecular component types. 

Figure 24 iUustrates clustering of wildtype and transgenic: mice date resulting from 
Principal Component and Discriminant CTC-DA > ') statistical analyisis. 
30 Figure 25 iUustrates a difference factor ^ectriim of peptides exhibiting significant 

differences (note^m/z value 1366). v 



WO 2005/020125 



7 



PCT/US2004/027022 



Figure 26 illustrates amassspectaimand a sequence of a peptide (m/z value 1366) from 
mouse plasma recorded usiiig LC/MS/MS, where the peptide deduced from the MS/MS 
spectrum is identified as residues 57-79 in the sequence of human apolipoprotein E3. 

Figure 27 illustrates a correlation network between biomolecular component types. 
5 Figure 28 illustrates a map of known relations between the correlation network 

associations and published information. 

Figure 29 'iifaistia^-lSFlcal ■**ci!flBferi^ass< , * *<loliveBailBs,»' in. teems «f Inomaitas 
("Markers") or therapeutic agents fbrt can be derived from a systems biology analysis. 

Figure 30A illustrates the experimental design of the ApoE3-Leiden transgenic mouse 

10 e xperiment. _ ... • - ~ ~ 

Figure 30B illustrates a scatter plot ofthecDNA micrdarraydata. 
Figure 31 A illustrates the LC/MS chromatogramsforthe digested liver protein fraction 

&r the ten samples. 

Figure 31B illustrates the clustering analysis of the tryptic peptide profiles. 
15 Figure 31C iUustrates a factor spectrum of the liver protein data. 

Figure 32A illustrates the clustering resulting from me principal component analysis of 

the liver lipid data set 

Figure 32B illustrates a factor spectrum of the liver lipid data set. 
Figures 33 A, 33B, and 33C illustrate a comprehensive systems analysis based on data 
20 from three biomolecular component types, where a relative abundance of 1.0 is 100%. (Figure 
33A - mRNA; Figure 33B - protein; Figure 33C - lipid). 

Figure 34 is a schematic illustrating hyperlipidemia and atherosclerosis in a blood vessel 
Figure 35 1 illustrates a whole plasma paraUel proteo-metabohc profiling scheme. 
Figure 36 illustrates NMR spectra for a wildtype mouse plasma sample (WT) and a 
25 transgenic mouse plasma sample (TG). 

Figure 37 illustrates a PC-DA score plot showing clustering of NMR data for the 
transgenic mouse, represented hy triangles, and the wildtype (or control) mouse, represented by 
circles. 

Figure 38 illustrates a difference spectrum characterized by a number of lines 
30 representing various metabolic components. 

Figure 39 illustrates total ion chromatograms (TIC's) for deproteinated lipid fractions 
from transgenic (TG) mice and wildtype (WT) mice analyzed by a 4-step gradient in the LC 
dimension with mass spectrum acquired over 200-1700 m/z mass range. 



WO 2005/020125 PCT/US2004/027022 

8 

> 

Figure 4G;illustrates total ion chromatograms fipm transgenic (TG) mice and wildtype 
(Wt) mice protein fractions obtained from iryptic peptides. 

Figure 41 illustrates >a;score plot showing PC-DA clusters for the wildtype :(WT) and 
tramgeaucmqu^(TG), 

5 Figure 42 instates diffejfcnce iactorsp ectra for Jteteinand m components. 

Figure 43 illustrates a schematic xgwesentation of data analysis workflow. 
Figure 44 illustrates tfifewoikflow 
pliatfdmis. 

Figure 44Affl^^ 

10 revealing^b m: dis^ ct clusters. ? ^. . — r ^ • 

Figure 44B;iUustratesCbSA unsupei^sed clustering of multiple datasets that have been 
concatenated. 

Figure 45 iUusti^ comppnehte of one 

sample that aj^ di^ 
; 15 45 A ilk grkph of seized ^ 

d^erenras between:^ 

Figure 46 fflustrates a correlation ^ Ae comparison be^een drug-treated 

diseased rodents andv^cile-trea^ 

Figure 474^ 

20 components in the drug-treated diseased rodents and Vehicle-treated diseased rodents (drug effect 
1 on disease). 

Fi^e 48 illustrates aplot^ bftiiepeak 
intensity values within each-group (after^^ related to peptides from 

certain proteins* 
25 Figure49 iUustxates CQSA distance cluster^ 

Figure^SO illustrates the workflow for a comparison and correlation of hnman sample data 
with non-human sample data. 

. Figure 50A illustrates the results of ;a COS A analysis of human serum samples in which 
theinput data set usedfor classification consisted of 366 lipid peal& chosen ftom the rodent 
30 model of the human disease. 

Figure 51 illustrates the success rate of an SVM linear classifier as a function of number 
of lipid peaks. 



WO 2005/020125 



9 



PCT/OS2004/027022 



Figure 52 illustrates a comparison of lipid abundance changes and correlations across 
human and rodent species. 

Figure 53 illustrates the workflow for analysis of several data sets. 
Figure 54 illustrates a graphical representation of selecting analytes for a biomarker. 
5 Figure 55 illustrates the performance of a fifteen analyte biomarker in grouping samples. 

Figure 56 illustrates the list of analytes from Figure 55. 

Detailed Description of the Invention 

The methods and systems disclosed herein rely on multiple measurements ofbiological 
samples, including analysis of metabolites, proteins, genes and; gene transcripts, to permit a 
-40 - skilled artisan to understand a biological-system in greater depttthan an approach that examines 
only one of these factors. Understanding the biological system as a whole can improve multiple 
aspect of pharmaceutical discovery and dewlopment including drug safety and efficacy, drug 
response, and the etiology of disease. As described herein, a systems biology platform can 
integrate genomics, proteomics and metaboloiriics, and bioinfoimatics, and results in a data 

15 integration and knowledge management platform that generates connections, correlations, and 
relationships among thousands of measurable molecular components to develop of a profile of a 
state of a biological system. Resulting profiles can be combined with clinical information to 
increase the knowledge of a state of a biological system. 

A profile" ofabiological system is a summary or analysis of data representing 

20 distinctive features or characteristics of the biological system, e.g., of a mammal such as a 

human. The data can include measurements or features derived from a biological sample type, a 
type of measurement technique, and a biomolecular component type. The data often are spectral 
or chromatographic features that are in the form of a graph, table, or some similar data 
compilation. A profile typically is a set of data features that permit characterization of a state of 

25 a biological system. 

A profile can be considered to include one or more ''biomarkers" ofabiological system. 
A biomarker generally refers to a biological component type, e.g v a gene, a gene transcript, a 
protein or a metabolite, whose qualitative and/or quantitative presence or absence in a biological 
system is an indicator of a biological state of an mammal. Thus, a profile can be considered to 

30 be a set of distinctivehiomarkers, e.g., spectral or chromatographic features, that permit 

characterization of a state of a biological system. A profile also can be considered to include 
correlations and other results of analyses of the data sets, e.g., causality. Thus, a profile can 



WO 2005/020125 



i'P 



PCT/US2004/027022 



compnse a plu^tjr of (^errat e&meiits as-described above, or can comprise only one of these 
elements, e,g.:,:bipma^er(s). 

A: w sj^pfaKoJqgi system 
exists, either ^natuMly w Examples ofa state'of a biological sy&emiriciude; 

5 birtare not Iiirii 

Response, a toraq^^ regulation (e.g., apc^as^ an age ^ re^pbiise, an, 

en\^nmental r^ons£ and a stress response. Mie^ 

wHch inclu mammals such asmice^rats^ guSieiapigs, dqgs,jcats,< 

mohke^and^^eut. 

10 , . Aprbfilfefof^ .— 

ariotiherpro^ arein fee same state,:e;g., a healthy or a 

diseased sktet^ 
than^ngn^ 

^ebidiogic^ sy^fem different sources 

15 : ; ]in a sm 
more 

^ : %de\$^ed%^ 
p^oitfrc^^ 

A'%iqmo^^ 
20 Hvffial^ 

int^hahgeably isferred^tfhere^ component types that generally 

; ^e associate 

system is referred to as genomics ;or:function^;genomics. Proteins and ihekxpnstituent peptides 
(viMchmay b^interchangea^ 
25 component type that generally is asscwiat^ yidth protein 

the4eyel ofthete ^Glycoproteins also are considered, 

a biomolecular component type. Another example of avbipmplecular coinppnent type-is 
metaboHtes which generally are associated 

a level.of ;abiplogicai:system ieferred to as metabolomics; Metabolites include, but are not 
3 0 limiteditb, lipids, steroids, amino acids, organic acids, bUe adds, eicosahbids, neuropeptides, 
villm^^ carbohydrates, iortic ofganics* nucleotides; inorganics,_ xenobiotics, 

peptides, trace elements, and pharmacophore and drujg breakdown;products. 



WO 2005/020125: 



11 



PCT/US2004/027022 



The methods described herein may be used to develop a profile of a state of a biological 
system based on any single biomolecular component t)^e as well as based on two or more 
biomolecular component types. Profiles of biomolecular component types facilitate the 
development of comprehensive profiles of different levels of a biological system, e.g., genome 
5 profiles/ti^criptdmic profiles, proteome profiles and metaboloine profiles, and permit their 
integration and analysis, That is, the methods may be used to analyze measurements derived 
fiom one or more biolo^cal sample t5Tpe, one or more type of measurement technique, br a 
combination of at least one each of a biological.sample type and a measurement technique so as 
to.pennit the evaluation of similarities, differences, and/or correlations in a single biomolecular 
10 component type or across twoor morebiomolegsiar .component types. From these- > — 
measurements, better insi^ biological mechanisms may be gained, novel 

biomarkers/surrogate markers may be detected, and inteiyentipnxoutes may be developed. 

A "biological sample type" includes, but is not limited to, bipod, blood plasma, blood 
serum, cerebrospinal fluid, bile acid, safiva, synovial fluid, pleural fluid, pericardial fluid, 
15 peritoneal fluid, sweat, feces, nasal fluid, ocular flmd, intraceHular fluid, into^ 

lymph urine, tissue, liver cells, epifhelM.cells, endothelial cells, kidney cells, prostate cells, 
blood cells, lung cells, brain cells, adipose cells, tumor cells, and mammary cells. The sources of 
biological sample types may be different subjects; the same subject at different times; the same 
subject in different states, e.g., prior to drug treatment and after drug treathaerrt; different sexes; 
20 different species, e.g., a human and a non-human mammal; and various other permutations. 
Further, a biological sample type may be treated differently prior to evaluation such as using 
different work-up protocols. 

A "measurement ^ tec^que" refers to any analytical technique that; generates or provides 
data that is useful in the analysis of a state of a biological system. For example, measurement 
25 techniques include, but are not limited to, mass speciiometry ("MS"), nuclear magnetic 

resonance spectroscopy ("NMR'% liquid chromatography Q%C 9 % gas-chromatography ("GC"), 
high performance liquid chromatography ("HPLC"), capillary electrophoresis ("CE"), gel . 
electrophoresis ("GE") and any known form of hyphenated mass spectrometry in low or high 
resolution mode, such asLC/MS, GC/MS, CE/MS, MS/MS, MS", and other variants. 
30 Measurement techniques include biological imaging such as magnetic resonance imagery 
("MRF), video signals, and an array of fluorescence, e.g., light intensity and/or color fiom 
points in space, and other high throughput or highly parallel data collection techniques. 



WO 2005/020125 



12 



PCT/US2004/027022 



Measurement techniques also include optical spectroscopy digital imagery, 
oUgpnucleotide array hyhridi^oii, prpteinana 

("gene c^s 9 ^^imim^ chain reaction, nucleic acid 

K^ndii^£bn, electrdc^ft^ emission tomography, 

aind Siriijfec^^ "For a particular 

analy^d^ mayinclM^d configurations or 

:settiii^ relying Id ^ 

jA^^a^^en^ 
technique. A "diataset" Muciesffleasuremente^ 
ijfeka mple/adat a 
coU^edby^esametecHm 
Further;;data setis ^ 

data, gene expression data, 'metal)olite concentration data, magnetic resonance imaging data, 
ele^^ , 
^biSqgi^ system being 

sj^ieft^ 

A c 'jSiB^[tl3I^? , QtaH3^^ s^refedES to lyi^c^^^^ date set 

thatii^ data set For example* : atprdfild tj^ic^y is a set of data 

features tb^|>ea^^ 

©ata sets may iei&to;sultoantia^ 
more measurement techniques; ;For exaii^^the 

measurements of different sam As a result a 

;first data setinay refer to experiment^ groiq) sample me^urements aiid a second data set may 
,referto control -group sample measUrementSi In addition, datasets may refer to data grouped 
based on any other classMc^pnxpnsi&r^ rdeyant For example, data associated with the 
spectrometry measurements of a single sample sourcemay be grouped into different datasets 
based on thejinsthimOTt used to perform themeasurement, die time a sample was taken, the 
rappeariance of -Example, br d^r identiffible variables and ch^acteristics. 

Accojrdiiigly, one data set may include a sub-set of another data;set For example, a 
grbupingbased on appearance of the sample may include one or.mpre experimental group data 
sets; "Where the measureinOTt technique is NMR, a data set may include one or moreNMR 
spectra Where the measurement technique is iilte spectroscopy, a data set niay 

include one or more UV emission or absorption spectra. Similarly, ; where the measurement 



WO 2005/020125 



13 



PCT/US2004/027022 



techniqueis MS,a data set may include one or more mass spectra. Where the measurement 
technique is a chromatpgraphic-MS technique, like LC/MS or GC/MS,a data set may include 
one or more mass chromatograms. Alternatively, a data set of a cbromatograpMc-MS technique 
may include one .or more total ion current CTICO chromMogra^ or reconstructed TIC 
5 chromatograms. In addition, it should be realized that the term "data set" includes both raw 
spectrometric data and data that has been preprocessed, e.g., to remove noise, to correct a 
baseline, to smooth the data, to detect peate, and/or to normalize the data. 

"Spectrometry data" refers to any ^ta&at may be represoited inthe form of a graph, 
table, vector, array or some similar data compilation, and may include d*rta ftom any 
10 spectrometric or chromatpgr^hic technique, The term %e^omefec measuremenf' includes * 
measurements made by any ^ectrometric or chroinatbgi^pMc technic 

Genti^ to the methods disclosed herein is the statistical analysis of a plurality of data 
sete. "Statistic^ iu^ysis" includes parametric analysis, non-parametric analysis, univariate 
analysis, muitiyariate analysis, linear analysis, c«>ndinear analysis, and o&er statistical methods 
15 known to those skilled in the art Multivariate analysis, which detennines patterns in apparently 
chaotic data, includes, but is hot limited to, principal component analysis ("PCA"X discriminant 
analysis CDA"), PCA-DA, canonical correlation ("CCO, cluster analysis, partial least squares 
("PLS"), predictive linear discriminant analysis ( C TLD A"), neural networks, and pattern 
recognition techniques. 

20 Of course before performing multivariate analysis, the raw data may be preprocessed to 

assist in the comparison of different data sets. In particular, to compare data across different 
biomolecular component types, appropriate preprocessing should be performed. Preprocessing of 
the data may include © aligning data points between data sets, e.g., using partial linear fit 
techniques to align peaks of spectra of different samples; (ii) normalizing the data of the data 

25 sets, e.g., using standards in each measurement to adjust peak height; (iii) reducing the noise 
and/or detecting peaks, e.g., setting a threshold level for peaks so as to discern the actual 
presence of a species fiompotential baseline noise; and/or (iv) other data processing techniques 
known in the art Data preprocessing can ihclude entropy^based peak detection as disclosed in 
U;S. Patent No. 6,7433 64, and partial linear fit techniques (such as found in XT. W.E. Vogels et 

30 al % 'Tartisd Linear Fit: A New NMR Spectroscopy Processing Tool for Pattern Recognition 
Applications," Journal of Chemometrics, vol. 10, pp. 425-38 (1996)). 

Throughout the description, where compositions are described as having, including, or 
comprising specific components, or where processes are described as having, including, or 



WO 2005/02012S 



14 



PCT/US2004/027022 



comprismg-specific process steps, it is contempiati^ that cpmpo^tions of the present invention 
also consist essentially o£ or consist of, the recited components, and that the processes of the 
present^inyention also consist essentially of, or consist, o^ the redtedffrpcessing steps- 
It should be linderafood t^'the order of steps or order for performing certam actions is 
5 iinmateri^ .so lbng as fte in^ reniains operable^ i.e., a profile of a biological system is 
deVelbped; Moreover two or more steps or actions may be conducted siinultaneoiisly. 

TOe methods descybed herdn.gerier^y include ey 
pluraBty Of data.sets of abi^ 

detenhiiie biie^r more sets of diSCTehces among at lea^vaportibff of the data sets so as to 
10 develop a;pi^le^ bnvthe compmsos^Ih some 

embodiments, Siefdatia setsare derived,!^ and iiiclude 

measurem^ri^^ in o^ict embodiments, the 

datasets are &rive^i&pm feyq pc mprecbiolpgical sample types and'include one or morie different 

types.of spectrpmetric measiu^ 
15i ^ InceiiA 

analysis. Inother-emtpd^ 

of datoseb,oii^^ 

p^^^statistic£d analysis: :For example, a pn^fe m^be>#viE^^4 by separately evaluating 
a plurality of data sets includi^ 

.20' and iapliiraUty of data sets mduding measurements derived froni metabolites in the biological 
^stem, then evaluating with stati^ analysis the resiilts of the in^vid^^anal^es 
prbfflete AltiOTatiwl^ 
plurali^b^ data sets rdMing to proteins and metabbto^ 
simultaneously evaluated with statistical analysis. 

25 Analogously, a profile can be developed from data sets including measurements derived 

from a. protein and a gene; a protein and a gene transcript; a gene and a -gene transcript; a>gene 
and a metabolite; and a gene transcript and a metabolite. A profile also can be; developed from 
data sets includingmeasurements derived from a protein, a gene, and a gene transcript; a protein, 
a gene and a metabolite; a protein, a;gene transaipt and a metabolite; and a gene, a gene 

30 tt^cri^^and;a metabolite; and a protein, a,gene, a gene trai^criptand a metabolite. In addition, 
each of the above pemutafiqra can indude, ia addition or as a substitution, a glycoprotein, 

Measurements for a particular bibmoleciilar component type usuaUy ^ 
measurement teciiiiique or techniques that are often used and known in the art for that particular 



WO 2005/020125 



15 



PCT/US2004/027022 



biomolecidar component type. For example^ an analysis of melaboUtes may use NMR, e.g., 
NMR; LC/MS; GG/MS; and MS/MS. Analysis of other biomolecular component types may xise 
LG/MS; GG/MS; and MS/MS. 

In one embodiment, the method generally includes selecting a biological sample; 

5 preparing the biological sample based on. the biochemical components to be investigated and the 
spectrometry techniques to be employed; measunhg the components in the biological samples 
using spectrometric and chromatographic techniques; measuring selected molecule subclasses 
using NMR Md to study compounds; preprocessing the mw data; using 

statistical andy^ wMch will be described in more det^ preprocessed 

0 data to identify patt^^ subclasses ^f mdleGules^dr m measurements- 

of components using NMR or MS; and^u^ to combine data sets fioin 

(Hstinct experiments and ident^ patterns ^f interest in the data* 
iWtee]^ 

comparison of the data across biomolecular component typ es. The invention also provides 
IS tecMquesfe 

suitable data sets using linear, nonlinear or other mathenmtical tools. Moreover, using these - 
associations and/or correlations to postulate networks of interacting biomolecular components to 
determine causality among these associations, and to establish hypotheses about the biological 
processes underlying the observations which give rise to the data sets, is still another aspect of 

20 the methods and systems described herein. 

The application also provides an article of manufacture where the functionality of a 
method disclosed herein is embedded on a computer-readable medium such as, but not limited 
to, afloppydisk, a hard disk, an optical disk, a magnetic tape, a PROM* anEPROM, CD-ROM, 
or DVD-ROM. The ftmctionality of the method may be embedded on the computer-readable 

25 medium in any number of computer-readable instructions or languages such as FORTRAN, 

PASCAL, C, C++, BASIC and assembly language. Further, the computer-readable instructions 
may be written in a script, macro, or functionally embedded in commercially available software 
such as EXCEL or VISUAL BASIC. In other aspects, the appUcafion provides systems adapted 
to practice the methods described herein. 

30 The data processing device may include an analog and/or digital circuit adapted to 

implement the functionality of one or more of the methods disclosed herein using at least in part 
infonnation provided by the spectrdmetric instrument In some embodiments, the data 
processing device may implement the functionality of the methods described herein as software 



WG.2005/026125 . PCT/US2004/027022 

16 

on a;genCTal-pmjwse computer, In addition,, such a program may set aside portions of a 
computer's rmdom access memory to jjf p\ade : eontro^ the sp^tometric 

measurement acqtrisitioi^ statistical analysis pfi^ta' sets, Md/6r proffer 
biological system.. Ihsuchah embodiment, theprogra^ vwitten in any ohq of a number > 

5 ;0fKi^evd : ^^ Further, the 

^grarni canbe ^^tfenitt^iitqi^. macro, or functionary embedded in froprieta^;s6ftwaf e or 
cbnmiercia^ Additionally^ the 

soffvsrae-c^ language directed to amicroprcfc^ 

acomputer. Fpr^e3i^pl4^ softv^e can bejbm^emented mMdi "SG6c86. assemldy ian^age if 
10 3t iscpi^ embedded on* ait-aftiele ; 

;of irraiu^^ 

floppy disj^ a hard disk, an optical disk, a Magnetic tape, a PROlft, an EPROM, or CD-ROM- 
Asshiw 

15 :sMqSfe?e^^ 

•/ theSriau^ ; 
sucWas patt^^ activity and 

metabolite dyn^icsl The methods discips^ 
employ 

50 met^blites,;bp Mo ah uhderste^dihg of their biocSemicxal 

mt&action to :eludda£e a profile of abibk^ This infonhatibjt^ 

th^exS^^ cb-va^ mde?d^ 

then are.used to assemble molecular networks and place compbuh 
sp.astodeyelopaprofile.of astateofte 
25 Figure 2 shows ailow chart of one embodiment of an analytical method 200. It should 

be und^stppd.that one or more of the steps : ;described below can be omitted and/or die order of 
steps can* be changed; so long as the embod^ent remains- o capable ;of developing a 

gioltt^ One or mpre data sets.205 taken from two or more 

biomol^ufc to further 

30 data analysis. Ih a prefeaed embopnent, tteMtid prpressing step typically includes 

concatenating one or more of the plurality of data sets. TWs initial preprocessing step; may also 
include integrating together the data sets based on a suitable schema of data hierarchy. In some 
embodiments, the imtial jprocessing step includes both a concatenation step and an integration 



WO 2005/020125 P C T/US2004/027022 

17 

step. The initial processing optionally may include, follow, or precede various forms of 
preprocessing including, but not limited to, data smoothing, noise reduction, baseline correction. 



The data sets that are the subject of the initial preprocessing step may include any 

5 measurable or quantifiable aspect of the biological system being studied. For example, the data 
sets may represent collections of, e.g., protein expression data, gene egression data, metabolite 
concentration data^ magnetic resonance imaging data, electrocaidibgraoi data, genotype data, 
and/or single nucleotide polymorphism data. Statistical methods suc> as principal component 
analysis may be utilized to convert the data sets to factor spectra* which are simply a processed 

10 foimofl£emwdata._ ^„ . * — — 

Means for comparing data sets of completely unrekted phenomena disparate units 
of measure is necessary, especially given the broad range of data sets th£ may be employed. 
Referring to Figure 2, for such disparate data sets, a normalization step 215, which is described 
•in more detail below, may be implemented; (^neraUy, individual data sets are normalized by 

15 scaling the data set with optimal scaling parameters calculated using amaximum likelihood 
estimator. Normalization facilitates comparison of data sets taken fromone or more 
biomblecular component types. 

■ An extraction step 220 is typically performed on the processed data. In the extraction 
step, one or more list(s) of components, wMch exMbit statistically significant changes, are 

20 extracted. The components typically are biological component types, or more specifically 
biomblecular component types. Further, these changes also are quantified as part of the 
extraction step. The extraction step typically involves a statistical analysis to discern the 
differences and/or similarities between the data sets. The extraction step and associated 
quantification of differences facilitates discerning similarities, differences, and/or correlations 

25 between or among two or more biomolecular component types for the biological sample under 
investigation. 

Suitable forms of statistical analysis appropriate for ^ 
component types include, e.g., principal component analysis ("PCA"), discriminant analysis 
("DA"), PCA-DA, canonical correlation ("CC"), partial least squares ( 6C PLS"), predictive linear 
30 discriminant analysis ("PLD A"), neural networks, and pattern recognition techniques. In one 
embodiment, PCA-DA is performed at a first level of correlation that produces a score plot, i.e., 
aplot of the data in terms of two principal components. Subsequently, the same or a different 



WO 2005/020125 



18 



PCT/US2004/027022 



statistical analysis k-perfonned on the data sets based on the differences and/or similarities 
discerned previous analysis. 

Ebr^ex^ple, in 'one embodiment,, w^ score 
plb% the next level ^f :^&fical processing may be a loading;plpt produced by a PCA-DA 
analysis. Tp^cdnii'M^ fiMleveLin that 

loading of ihdiyidupl input vectors to the FCAr 

DAtb^iiitim^ For example, ^^^h date^set'mciudes a 

plurcdi^ 

ori^ating?%m one sample so^ ItfcompOT^ 

contribution of a par ticular ma ss or range ofniasses to the coridatioM between data sets.- ^ 

Similarly, ,^*fore^^ 
represents pneNMR ^ectni^ 
r^iesents-the-OT 
cprreM^^ 
' , Sg^ ^ 

#aohg1he=ex^^ 

coirelation^ni^ otherwise) of the: 

bioiholeculaf component vary in abimdance be^en oi^>ot more groups 
*df samples. T^^eok^^ "c??^^ 

Forexample, ifboth a gene and a protein are 1 as compared to group 2 and 

thetipteg^a^^^ 

and protein are considered tb be "cbirelated." AMbgpusly, bidmpiecular component types 
may be anfi-correiateii Moreover different "strengths of correlation" exist, which depend on 
how tightly asynchronous the relationship is between or among the two . or more biomolecular 
types. 

A comparison step 230 is performed after fte conrel^on netwbrks have been established. 
The coirelationmetwo bdthcc^^ons and ^^cSrrdatiohsi 

are cori^^d. and evaluated based on qdsting loipwledge of the; component or biological system 
uMer ihvesti^atioh: ThisMowledge relates to the associations whidhm^ from 
established sources such as research Uterature and/or e^ermental studies.. 

Subsequently, apertuft^ analysis. 
The biological system.subject to investigation ^is typicaUypertuA changing^an experimental 



WO 2005/020125 



PCT/US2004/027022 



parameter and monitoring the system for a prescribed amoi^ Examples of 

perturbations include, but are not limited to, introducing a drug, altering a gene, changing an 
environmental condition, or making another suitable change. A perturbation also encompasses 
the idea of comparing across species, i.e., performing the workflow on an animal system and 

5 performing substantially Ihe same workflow on a human system to investigate the similarities 
and/or differences between or among species. 

Following the perturbation step 235, new data sets and correlation networks are produced 
240. Thus, as a result of the perturbations introduced into a given biological system or sample, 
new data sets arise that are measurable. Similarly, as part of step 240, new correlation networks 

10 may be developed based on ^ose novel fiostrperturbation data sets. J&e^tistieally sigmfieaht- - - 
changes in the new data sets, as determined in comparison to the pre-perturbation data sets, are 
discerned by comparing the statistically significant biological component types in the new data 
sets with the component types of the previous experimental results 245. In addition to looking at 
the statistical changes between biomolecular component types before and after system 

15 perturbation 245, correlation nelworls may be malyzed inland. Therefore, the correlation 
network association networksmay be compared before and after perturbation 250. After these 
two levels of comparison 245, 250 have been perforated, alterations or changes between 
components and associations can be identified 255. 

Thereafter, perturbations to the system being investigated can be iterated 260. A 

20 feedback loop results among the initial perturbations to the system, the system itself the 
production of new data sets, the comparison of significant components with the previous 
experiment, the comparison of new correlation network associations with previous ass ociations, 
and the identification of changes. The feedback loop may be iterated until causal relations can 
be identified 265 between multiple biomolecular component types and the correlation and 

25 networks which characterize their impact on the biological system. 

Referring back to the normalization step 215 in Figure 2 and introduced above, a method 
for normalizing gene expression data, protein data, and metabolite level data is now described. 
A sample variety effect, an array effect, and a dye effect are introduced into a log-linear model, 
and a maximum likelihood maximization technique is applied to calculate all the parameters of 

30 the model and determine the optimal scaling factor for each array and dye. The normalization 
method is generic and can be applied to a variety of data, experimental setups, and designs. The 
model described below uses terminology from gene expression analysis. For example, the 
"array" in proteomics experiment could be one mass spectrometer run, and the "dye" could 



WO 2005/020125 1 PCT/US20 04/027022 

20 

describe all samples;used,dinih 



Normalization model. The data matrix x is characterized by the gene index 
5 v(v •» L. JVy ) ; For each variety v , there are Cy samples coiraponiiing to it,.so 

data point is imigpi^ 

; ^ere>the;g^ ju^ , the aray effect by j£,1l^dye effettl^ 

J%,^-flie^ 

zero mean and the variance oj; , ile., the variance is permitted to be different for each gene-and 
15; variety. The varied index y is a:uniquc fiihction of land A* and can be: written as {i>k}ey. 
Since the g<^e mtt^^ baftedi^the Bi#H^(m of 

regression levels be des 

A maximiimvlikeliliood estimation is used to calculate flie opting scd^ 
20 properiy normalize ^ Soh^gMfeepai^ q: gy :leaSs}p ih& 

foU6vwng;^iktidhs: 



a 1 



WO 2005/020125 



21 



PCT/US2004/027022 



The optimal scaling factors for each array and dye are then: 

* — (5); 

so the nbrnialized expression levels are: 

V^V e ^W- (6) 

5 Significance tests and bootstrap methods. Hie normalized date may be compared to a 

null models and a^-value maybe calculated that measures the probability that the deviation of 
the data from thenullmodel can be attributed to the random error. The parameter used for 
comparison is the fold ratio between the two chosen varieties. To evaluate the method, attest is 

performed to compare the two chosen varieties. [ShesHn, Handbook of Parametric and 

-*0 ~ Noipaan^^ The corresponding 

p values were calculated for each gene. When assessing the statistical significance of fold 
change for each gene, one needs to take into consideration the total Ng ^-values calculate^ as 

several p -values with * * fa* are expected. To account for this, the ov^aU 1^ 
P(p) , of observing a p -value <p for any of the N B genes is used. Assuming independence of 
15 all genes, the overall likelihood is estimated with: 

i?(p)«l-(l-p) Hs . (7) 
Assuming independence of . genes is obviously an oversimplification, and the correct way 
to calculated-values and P(p) values is by using the bootstrap method with the parameters 

( ^Af.D^aJ) of the null model being used to general random data sets. 

20 Example 1. Normalization of Gene Expression Data from theLher of an APOE*3-Leiden 
Transgenic Mouse 

To illustrate the normalization method, a study of the ApoE3-Leiden transgenic mouse 
was performed. A total of 9,596 genes were analyzed using ten cDNA microarrays. Samples 
were collected from a total of four ApoE3-Leiden transgenic (TG) mice and four wild type (WT) 
25 mice. An optimized design of the experiment is shown in Figure 3 . The variety vector was 
therefore 

Vars =[11 1221 1 221122 1 1 2222 l]. (8) 
At-test was applied, comparing the normalized values of transgenic and wild type mice. 
Figure 4 shows the significance plot of the data based oh j7-values fromthe t-test and fold ratios, 
30 The horizontal line on top shows the overall likelihood P(jp) = 0.05 cutoff; while the lower line 



W<X2005/020125 



22 



PCT/US2004/027022 



shows tiie cutoff, p== 0.05. Only 16 genes satisfythe^ while there 

are 713 genes iaihe ^ < p,05 : range. 

Protein data from liver. Eightsamples -from eight diffea^t animals, four ^ tr^]|^c and 
four ^dtype, were analyzed in eight ex^riments. Tftev^ety vector^ therefore:^ 

Vars±$^Al2222\.: (?) 

contai^g 1600 peaks. llie;!^^ 

^ : devdqped ^ andis ;<ies^ed intJ.& PateMNo. 6 3 743,364, 

?^tenn^ daia;set with IQ>0:5 was retained 

^am^ AtoMoMo^ 

0om fiction ^^l;in fiaction 3, 454m Mctibii^. ; ^gnifi^c^ plot 

is^own^ pe^s^fyii^ =0.05 cutoff;^^^ there 

arer^peaii^vM p < 6$S. ^in^Ons defo^he^^ 
^tddbepe^^ 

; Synthetic ^GIST?? data; Kpepram 
Mghef number of dyes^ data with 2000 peaks, 5 dyes, 3 varieties, and 

6 expei^ents was peifonhed. Thi^co^fipt^^ 

petfoimtedtiSing and 
Regrner, F., J. Chroniatbg. A 949, 173-84 (2002)X The e^ermeht design is ^6^in Figure 6 
and can^also be de^^ 

Vafs =[11223 221-133 1 12232 2 1 11132 2 22 3 1,1] , ; ^0) 

The background for eac& \ 
generator, set tb eqvi^ mem sSd variance; Tbreelarge peaks have then been a^ 
the, variety 1 and 2* respectively Figures;7-9 show the 

scatterplots and normd eiach of the varieties; The three outliers are clearly 

seen for varieties 1 and 2. The fold ratio : 

(Variety!) 

Fold - /v - ,i > (11) 
[Vqrietyl) ' 

was calculated for/each peak, and a t-test was used to compare the two varieties., The 
significance plot is shown in Figure 10. As expected^ only six outliers satisfy the 



WO 2005/020125 



23 



PCT/US2004/027022 



F(p) = 0.05 cutoff criterion, while there axe a total of 94 peaks satisfying jp<0.05, despite the 
feet each peak (except the six ouffi^s) has been generated ^ sample 
independently. 

Dlustratiye examples of the work flow in Figure 2. Three additional examples are 
5 disclosed herein to further illustrate the experimental methods, techniques, and analytic 

approaches outlined in the flow dkgram illustrated in Figure 2. More detailed flow diagrams are 
presented in Figures 1 1, 12, and 13, which describe preparing a data set from a biological sample 
and then extracting a list of either genes, proteins, or metabolites that exhibit a change in 
abundance abovethe threshold value. Figures 11, 12, and 13 can be understood asahigher 
0 re soluti on picture of Figure 2, andin padicular r focusing on Steps 205 through* 220 in Figurg"2T~" * 
Fig^ integrating the extracted list of components to produce correlatibn networks 

that can be used to compare the network associations ^fh associatioiis known in the literature 
(Steps 220, 225 and 230 in Figure 2). To provide an even finer resolution picture of the 
illustrated embodiments, individual Figures 15-29 are presented, which map directly onto 
15 individual steps shown in Figures 2, 1 1, 12,13 and 14, 

Example 2. Systems Biology Analysis of t!ieAPOE*3-Leiden Transgenic Mouse 

As a test case for the application of systems biology analysis to a mammalian system, the 
apolipoprotein E3-Leiden (APOE*3-Leiden, APOE*3) transgenic mouse was selected. Apo E is 
a component of very low density lipoproteins (VLDL) and VLDL remnants and isrequired for 

20 receptor-mediated re-uptake of lipoproteins by the liver. [Glass and Witztum, Cell 104, 502 

(198?).] Hie APOE*3-Leiden mutation is characterized by a tandem duplication of cpdons 120- 
126 and is associated with familial dysbetalipoproteiuemia in humans; [van den Maagdenberg et 
al 9 Biochem. Biophys. Res. Commun. 165, 85 1 (1986); and Havekes et al., Hum. Genet 73, 157 
(1 986).] Transgenic mice over expressing human APOE*3-Leiden are highly susceptible to diet- 

25 induced hyperlipoproteinemia and atherosclerosis due to diminished hepatic LDL receptor 

recognition, but when fed a normal chow diet they display only mild type I (macrophage foam 
cells) and II {fatty streaks withintmcellular lipid accumulation) lesions at 9 months. [Jong et al, 
Arterioscler. Thromb.Vasc. Biol. 16,934(1996).] 

APOE*3~Leiden transgenic mouse strains were generated by microinjecting a twenty- 

30 seven kilobase genomic DNA construct containing the human APOE*3-Leiden gene, the 

APOC1 gene, and a regulatory element termed the hepatic control region that resides between 
APO C1 and APOE*3 into male pronuclei of fertilized mouse eggs. The source of eggs was 



WO 2005/020125. 



24 



PCT/US2004/027022 



superbyulated (C57B1/6J x CBA/J) Fl females. Transgenic foimder mice.were further bred with 
C57Bl/6J inice to establish traragenic strains. Transgenic and noh-transgenic littermates of F21- 
F22 geherations were used in ^tee experiments. All mice were fed a normal chow diet (SRM- 
A, Hope Finis, Woerden, The Netherlands) and sacrificed at nine weeks, at which time plasma, 
urine; Md.Uyer )tissiie samples were taken and frozen in liquid nitrogen* The samples from each 
individual were thdh subchyided for separate gene expression,protein, and metabolite -analyses. 
The restflfe bf ^ soluble protein, and lipid differential profiling 

analyses applied to liver tissue, plasma^ andmrinelaken from wil^ype ^dM^*3^dden ; 
mice that were fedanbrii^ 

WMty^ of he tr^gemc mice-^rhrotiier 

words, as control mice: 

\Wthyeference toFigures 11413,^Hoiogicai conditibn 1105, 1205jl305io be 
inyesfigated is lipid specifically atherosclerosis 

^d hyperhpidemia in an AP6E*3-Leiden transgenic mouse. The samples collected 1110, 1210, 
13^ were in^ 

Liver gene expression. Referring to Figure 11, ^ 
homogenized liver tissues nsmg commercMy' RNAeasy kits (Qiag(^, Germantown, 

Iv^land). m^ wlen extracted lldtS from^the total RNA preparations using a 
commercially bought; QUjgotex Mt (<^ag Gennantbwn, Maryland). Gene expression 
imicSbairay data were acquired using theMouse UniGene 1 spotted cDNA array (Incyte 
Genomics, St Louis, Mssoun). In one embodiment an analysis of variance (ANOVA) model 
was selected for the : design of the sample pairings ^ pptimaUy red inherent in the 

technique. 

A mRNA abundance expei^eht 1120 was performed on the liver tissue. In one 
embodiment, the e^erimmt includes mRNA hybridization. Serial analysis of gene expression 
and/or pattern recognition may be performed. In one embodiment, a PARC pattern recognition 
program is used. Figure 15 illustrates a mRNA abundance experiment In particular, a gene 
expression analysis is illustrated by a mouse Hver inRNA expression ratio plot for APOE*3 
transgenic mice versus wildtype mice. Examples of gene expression data sets 112S include not : 
only the liver gene egression analysis illustrated in Figurel5, but also the gene expression-data 
illustrated in Figure 16 and the gene expression abundance results illustrated in Figure 17. 



WO 2005/020125 



25 



PCT/US2004/027022 



Profiling of proteins extracted from the liver and plasma. Proteins were extracted 
1215 from frozen liver tissue and plasma samples 1210. Chromatography steps 1220 may be 
utilized to ftaiiher clmacterize tie sample. In one embodiment, the proteins are chemically 
modified 1225 following ftte chromatography step 1220. In another embodiment, the proteins 

5 are fragmented into peptides 1230 following either the chromatography steps 1220 or the 
chemical modification step 1225. In one embodiment fragmentation 1230 is performed by 
partial hydrolysis of the proteins. A second chromatography step 1235 may follow the 
fragmentation step 1230, and a mass spectrometry step 1240 may follow the chromatography 
step 1235, Li one embodiment, a PARC pattern recognition program is used to quantify the- 

10 proteins: AGIST isotopic labeling method may alsoAejiiffizecL Identification of the proteins- 
may be performed with either mass spectrometry or BioSystematics. 

Examples of protein-derived data sets 1245 are shown in Figures 18-20. Figure 18 
illustrates intensity plots ofLC/MS total ion chromatograms (TIC's) of plasma from APOE*3 
transgenic mice vs. wildtype mice. In Figure 19, TIC's from LC/MS profiling, which can 

15 elucidate subtle detectable differences, are shown. Both Figures 18 and 19 illustrate the 

complexity of a dfirta set 1245, as they are included of greater than 1000 peptide peaks. Figure 20 
illustrates LC/MS chromatograms acquired from the digested liver proteins of five transgenic 
mice and five wildtype mice. In one embodiment, LC/MS is performed using an LCQ DecaXP 
(ThermoEinxngan, San Jose, CA) quadrupole ion trap mass spectrometer system equipped with 

20 an electrospray ionization (ESI) probe. 

Profiling of metabolites extracted from urine and plasma. Metabolites were extracted 
from the urine and plasma samples 1310. The urine samples were profiled using one 
dimensional, ! H NMR 1315. NMR spectra are one example of a data set 1340. A data set 1340 
also may be generated from the plasma data by a chromatography step 1320, and then followed 
25 by a chemical modification of the metabolites 1325. The modified metabolites 1325 may be 
characterized by a series of chromatography 1330 andmass spectrometry 1335 steps to generate 
a data set 1340. hi one embodiment, the plasma samples are ionized by ESI and characterized 
using LC/MS. 

Examples of metabolite data sets 1340 are shown in Figures 21 and 22. Figure 21 
30 illustrates ! H NMR spectra of metabolites extracted from plasma for APOE*3 and wildtype 
mice. After referring to the -CH* signal of MeOD (S = 3 .30), line listings were prepared using 
the standard Varian NMR software. To obtain these listings, all resonances in the spectra above 



WO 2005/020125 



20 



PCT/US2004/027022 



a threshold corresponding to about three times the signai-to-noise ratio were: collected and 
converted to;a data file fonnat suitable for statistical analysis applications. Figure 22 illustrates 
mass chromatograms of plasma lipids ^corded using LC/MS for APOE*3r and wil'dtype mice. 

Confining Data Sets. Referring back to Figures one ^bpdjment, the gene 

5 11^ 

molecular fimctions and elucidate ceMar mechanisms. A number of bioinforn^cs tools can be 
uti^d to lir^ The data sets 1125, 

1.2^5, 1340 are s^ 

Figure 2). A^IMPRESS algorithm n^y be used to r^uce ^k^oimd noise in both LC/MS 
10 - chromatogra^ &d^N^ the IMPRESS algorithm isuselto 

generate IQ?fil^f^ 

In on&pftodi^^ 1345 is: 

treated withva ^tistical analysis.stejp 1135, 1255, 1350.. Suitable fdi^ of 
Scribed in inore de^ 
15 algorithm. In Mother ^bckl^ 

may be performed on the In one embodhneni, 

|ii^ntiati% 

analysis. 

Figure 23 depicts spectra teated by the normalization step 215. Individual gene, protein, 
20 ^4me^ 

nbrm^ed ^ectra are c^nc^eMted intpf a single fectbr spectnim. In Figure 23, the data 
measured on a biological sample extracted firpm mouse liver: Using the {concatenated spectrum, 
v<^ct comparison across b^ types may be performed, . 

Figures 24-25 provide;an illustrative embodiment of the statistical analysis step 1135* 

25 . 1255, 1350 and the subsequent inspection step 1140, 1260,1355. For the sake of simplicity, 
only the protein plasma ah^ys^ is pieserited, but tiie method can be extended to both genes and 
metabolites. Figure 24 illustrates clustering of wildiype mouse data and.APOE*3 transgenic : 
mouse data peif<umed using a PC-DA 1255 on the peptide ion mass data. An inspection 1260 of 
the two distinct clusters shown in Figure 24 reveals that the masses- of the ions dififerentiatevthe 

30 two clusters. Figure 25 shows the masses of the peptide ions erfubiting. significant differences 
plotted in a difference '^tor spectrum. In one embodiment, at-test is applied to each of the 
differentiating ions to test their significance. In another embodiment, loading plots are used 
instead of factor spectra. 



WO 2005/020125 



27 



PCT/US2004/027022 



An additional mass spectroscopy analysis step 1265, 1360 may be performed to analyze 
further the proteins, peptides, or metabolites that exhibit a change above a threshold abundance 
level, in one embodiment, MS/MS is used to analyze andidentify the proteins, peptides, or 
metabolites. In another embodiment, genes, proteins, peptides, or metabolites that exhibit a 
5 statistically significant change are identified during ^flie inanual inspection step 1140, 1260, 1335. 
Subsequent to identifying all genes, proteins, peptides, and metabolites 1145, 1270, 1365, a list 
of those genes, proteins, peptides, and metabolites is extracted and stored 1150, 1275, 1370 for 
future comparison^ 

Figure 26 d^icts an MS/MS spectrum of the peptides generated by hydrolysis of the 
10 proteins extracted from mouse plasma j shich corresponds to step4265 in Figure i2. Those — 
peptide fragments, which are labeled b7-bl7 and y5-yl6, are compared to a database, so that the 
protein which was fragmented can be identified and sequenced, which corresponds to the 
identification step 1270 in Figure 12. In this particular case the protein identified is human 
ApbE3 which is the protein introduced by the transgenic manipulation. 
15 Table I lists thefcey dSEferenfia% eacpressed components extracted from the lists of genes; 

proteins, and metabolites. This list was generated in accord with steps 1150, 1275, 1370, which 
are illustrated in Figures 1 1-13 . The extracted list of components also corresponds to the extract 
list of components step 220 in Figure 2. 



Table I. Key differentially expressed biomolecular components (Excluding human ApoE3). 



Biomolecular 
comtMjnenttvne 


Component ID 


Name 


Fold Ratio 
(APOE3:WTY 


Gene 


G 7801 


Heat shock 70 KD protein 


3.10 


Gene 


G562 


RKEN cDNA 3230402M22 


2.72 


Metabolite 


Ml 


Trigycerides 


2.59 


Metabolite 


M7 


DAG CI 8, 20:1 


1.92 


Metabolite 


M9 


LysoPCC16:0 


1.68 


Gene 


G7485 


Apoptosis inhibitory 6 


1.51 


Protein 


P 1059 


FABP (fetty acid binding protein) 


1.36 


Gene 


G 1615 


Heterogeneous nuclear RNP HI 


1.35 


Gene 


G693 


FABP (fetty acid binding mKNA) 


1.33 


Gene 


G1032 


Translation Initiation Factor 2 


1.14 


Metabolite 


M3 


PCC20,20:8 


0.94 


Gene 


G8147 


Apolipoprdtein Al 


0.76 


Protein 


P744 


Protein Kinase C, epsilon 


0.74 


Protein 


P451 


ATP-binding cassette (ALD), meml 


0.72 


Protein 


P 1439 


Heme oxygenase-2 


0.64 


Protein 


P 1362 


IPF1 


0.59 



WO 2005/020125 



28 



PCT/US20Q4/027022 - 



In one embodiment, the individual biomolecular corqpbnents listed in Table I are 
nonnali^d^^ meaningful comparison across biomolecular -component types can be 
performed. In another embodim^ 

used to jiroduce a correlation network in accord with step 225 in Fig^e;2 and step 1420 in 
Figure 14, Figure 27 iUiistiates a cpire^on network b^een biomolecular Mir^on^ ^pes. 
Thenetwori:^^ featoec^n^ 
^spciatibi^ 

ffien^ybeTOmparedtoeri 

sbinx^ Wlnckp^ step 230 in Figure 2 142 

illustrates a^n^ o£tbe;taiown rdatipc^twe^ ~- 
pubUshed-infpi^tio^. 

^otiations 
depicteji ^ 

action l430vM^nei^ . 

assdcMve^ ty^es s 143^i Tbe knpwii 

relations*^ 

fciomblecular 1435^ 

Rfitur^^ As^edabpve, 
^thepertuAeds^ 

newcoixeii^ deducing tiievcausal mechanisms ofiheperturb^ioiS; 

The perbrbations to 4esyst^ are determined between 

multiple^ 

From the'Hon^ers determined fr^^ biology anaysis, similar to the one 

described abov^ diseased and healthy populations n^iy be derived, 

TMsin^ context to determine,,e.g., 

whetfkmark^: cto be^dentified as e&era causal 

disregulated pathway. ^ SesGribed above, ^comprehensive;gehe i protein; and.metabolite 
pr6filing, ;Coupled vidth;coneIation analysis and network modeling,,provide insight into 
biological context, and:this level of knowledge may bemused to develop therapeutic agente or 
may serve as a basis for directed, ij^A^SKWven e^eriments that are designed to further 
elucidate pathophysiologic mechanisms. 



WO 2005/020125 



29 



PCTAJS2004/027022 



Figure 29 iUustrates typical "offerings^ or "deliverables," in terms of biomarkers or 
therapeutic agents that caabe derived from a systems biology analysis. Described below are two 
examples that illustrate not only "typical systems biology analyses, but also , a more detailed 
description of how the information derived from these systems biology analyses is employed to 
5 determine not only wWcli tibierapeutic ag^ts should be used, but also which palhophysiold&c 
mechanisms require further study. 

Examples. Systems Biology Analysis of Hie APOE*3~Leiden Transgenic Mouse 

Hie results of combined mRNA expression, soluble protein, and lipid differential 
profiling analyses applied to Uvea: tissue, plasma, and urine taken from wild type and APOE*3~ 
id- I^iden^nice^th^; weaMed a normal chpwdietand &acrific#d1at 9 weeks of £ige are presented 
below Results from each bipmolecular component type class analysis reveal the presence of 
early markers of predisposition to disease. In addition, results of a correlation analysis are 
suggestive of networks of molecules - ^armmg genes, proteins and lipids - that undergo 
concerted change. 

1 5 Animals. APOE*3-Leiden transgenic mouse strains were generated by microinjecting a 

twenty-seven Mlobase genomic DNAconstruct containing the human APOE*3-Leideri gene, the 
APOC1 gene, and a regulatory element termed the hepatic control region that resides between 
APOC1 and APOE*3 into male pronuclei of fertilized mouseeggs- The source of eggs was 
superovulated (C57B1/6J x CBA/J)F1 females. Transgenic founder mice were further bred vdth 

2Q C57B1/6J mice to establish transgenic strains. Transgenic and non-transgenic littennates ofF21- 
F22 generations were used in these experiments. All mice were fed a normal chow diet (SRM- 
A, Hope FarrnSi Woerden, The Netherlands) and sacrificed at nine weeks, at which time plasma, 
urine, and liver tissue samples were taken and frozen in liquid nitrogen. The samples from each 
individual were then subdivided for separate gene expression, protein, and metabolite analyses, 

25 Liver gene expression. Total mRNA was extracted from homogenized liver tissues 

using commercially bought, ENAeasy kits (Qiagen, Germantown, Maryland). mRNA was then 
extracted from the total RNA preparations using a commercially bought, Oligotex kit (Qiagen, 
Germantown, Maryland), Gene expression micrbarray data were acquired using the Mouse 
UniGene 1 spotted cDNA army (IncyteGenomics, St Louis, Missouri). An analysis of variance 

30 (ANOVA) model was selected for the design of the sample pairings that optimally reduces 
variation inherent in the technique. 



WO 2005/020125 



30 



PCT/OS2004/027022 



liver protein profiling. Froz»n Uyer &sues w in a.pre-chilled mortar that 

was kept&ia^ TtPBR protein extraction reagent (Pierce 

" Cftefficd ; ^ fiL/mg of tissue ^ and the sample was 

f^^fiqmogra^ed ^ byiomcation. S^ples were then centrifuged ait 10,000 x g fbr 5 minutes, 
andthe st^pi^^fe^coUectei Bjedatiye^to^ 

: 'ihtegra^^ exclusion 
chromat^ comistmg qfa'Super SW30OO TSKgel ^column (Tosoh.Biosep, Tokyo) 

^:ml£C?Packi^ To reduce: sample complexity, ihe : 

prbtemsi^ 

■ Wori^ £ali&^ — ' 

cbltnnn (&6^*^ 
wafer/ace^ 
were digestej^fe 

ic^acetamide at%75^G^fOT SO^iiinu^ por£4 L . 
hbMat;37°6. / : 

;^^p^ 

m^sspectrpmeter system equipped wi&^ The IX} cp^bnent 

c^nsi^d of a. Surveyor airt^ S an Jose, 

GA). Simplest and eiuted through a Vydac low^TFA CI 8 

colucm-(r50 * X#iv5 ppO (<3^ iTZiae cdlumn was elute&at'SO 

piymmute isocraticlyfbrt^ 

95/4.95/0:04/0^^^^ followed by a line^ ^gradient over 43 minutesto 75% Solvent 

B (watei^^ The electrpspray 

iomiatipn voltage was. set to 425 W and the h^ Nitrogeffisheath 
md auxfli^ 3 units, respectively. For quantificationibf tryptic 

peptides, the scan cycle consisted of a single full scan massspectrum acquired overm/z 400- 
2000 in the positive ion mode. Bata-dependent product ion mass-spectra (MS/MS) twere- also 
acquired for peptide identification lJ^ing the TxifbpSBQUEST algorithm J^ennoFiiinigan, San 
Jose, CA). 



WO 2005/020125 



31 



PCT/US2004/027022 



Liver lipid profiling- Liver tissue was fieeze-dried, pulverized, and then extracted with 
20 |iL isopropanol per mg of tissue in an ultrasonic bath for 2 hours. The samples were then 
centrifuged and tile supeniatants collected. Samples were then diluted with 4 volumes of water 
and taken for LC/MS analysis; LC/MS data were acquired using an LCQ (ThermoFinnigan, San 

5 Jose, California) quadrapple ion trap mass spectrometer equipped with an electrospray ionization 
probe, TheLC component consisted of a Waters 717 series autosampler and a 600 series single 
gradient forming pump (Waters, Milford, Massachusetts). Samples Were injected in duplicate, in 
random order, onto anlnertsil coluinn (ODS 3 .5 mm, 100 x 3 mm) protected by an R2 guard 
column (Chrompacfc). Three mobile phases were used in the elution: (1) 

10 (waterMeCN/zu^ acid, SS^/S/l/O.ly.voWol/voiyvol)^) (acetomtrile/ " 

isopropanoyammonium acetate/formic acid, 68.9/30/1/0.1, yol/vol/yol/vol), and (3) 
(^oprop^ol/dichlorpmethmie/ammotuvim acetate/formic acid, 48,9/50/i/0.1, vol/vol/vol/vol). 
The column was eluted at 0.7 ml/minute "using a two-step gradient Step (1) from 0 to 15 
minutes beginning with 70 % A, 30 % B,0 % C and ending with 5 % A, 95 % B and 0 %, and 

15 Step (2) a 20 minute gradient with no change in A, 95% to 35% B, and 6 % to 60% C The 
electrospray ionization voltage was set to 4.0 kV and the heated transfer capillary to 250°C. 
Nitrogen sheath and auxiliary gas settings were 70 and 15 units, respectively. For quantification 
of metabolites, the scan cycle consisted of a single full scan (1 s/scan) mass spectrum acquired 
over m/z 250-1200 in the positive ion mode. 

20 LC/MS data pre-processing. LC/MS data sets were converted into ANDI (.cdf) format 

using the File Converter functionality built into the Xcaliber instrument control software 
(ThennoFinnigan, San Jose, California). The IMPRESS algorithm (TNO Pharma, Zeist, The 
Netherlands) was then applied to the converted files for automated peak detection and peak data 
quality assessment The program evaluates each mass trace for its chromatographic quality by 

25 assessing its information content The LC/MS chrbmatogram at each mass to charge ratio were 
smoothed to remove noise spikes and then the entropy of the trace was calculated using Equation 
12. Taking the reciprocal value of Hand scaling all results to the largest \^lue gave each mass 
trace a scaled chromatographic quality number called the Impress Quality (1Q): 

H = -t i P?log(p}). (12) 
w 

30 An IQ threshold was then selected, and ifthe lQ of a peak was below the threshold, the peak was 
deemed to be of poor quality and was not taken forward to clustering analyses described below. 



WO 2005/020125 



32 



PGT/US2004/027022 



Normalization of microarray data. As described above, the data may be represented • 
by the following model: 

where the . gene arid variety effects are described by the axxay effectby^,, the dye effect by 

. Z)^ .Md tiie error by Hie error is normally distributed with zero mean, an^ 

q*,, is riot permitted to be differeritfbr etoh,g|me^d variety. The optimal parameters ^of ihe 

model are c^c^ For each particular array and-dye, 

the samples are then scaled as: 

Statistical tests of significance. To estimate the statistical significance of difference 
m^noMa^zed intensities frpin^tiraMg^c and wdldtypes^ 

df the ^igen^, sind the con^6h(pg jp-values were calculated: When;assessing:the statistical 
significance of Md ^change for 

< • a05 were expected. To accouitf fortpi, ffeoyeraU^^ 
value for^yqf^ 
likeliho o d was estimate 

n&~l^-pf„ (15) 

PGD A analysis and correlation plots. Principal component and discriminant analyses: 
(BGDA) were ; applied to the tiryptic peptide and lipid LC7MS profiles that had been pre- 
processed with the IMPRESS algoriffim as described above. This was dqne^using WINLEST 
statical software Pharma, Zeist, The Netherlands). 

Microarray analysis of liver gene expression. Mouse liver mENA samples were paired 
for hybridization oh the UniGene 1 cDNA spotted microarrays following the "loop design" 
shown in Figure 3 OA. This method of pairing was based on an ANOVA model that was 
designed to pirovide a basis for optimal normalization of gene expression data and to minimize 
the contribution of variability that Mght arise fibm factors^such as unequal rates of 
hybridization between nucleic acids or dye^effects. mRNA samples were labeled with-Cy3 and 
Cy5 for dual hybridization, as shown 

As evidenced by the cDNA micrbaiiay data scatter plot shown in Figure 30B, relatively 
few genes were differentially expressed at the 95 % confidence level. Values were plotted as 



WO 2005/020125 



33 



PCT/US2004/027022 



mean values of expression in wild type and APOE?3-Leiden transgenic mice, and data points 
were color-coded on the basis of statistical significance. Far fewer met a more rigorous overall 
likelihood, P(p)? assessment that attempts to rifle out chance events where data may randomly, 
but falsely, have ^values < 0.05. 

5 Table II lists a sample set of genes where the fold-ratio between transgenic and wild type 

control was either less than 0^8 or greater than 1.2; The relatively low^values that were 
observed despite the mthernairow margins of difference in expression reflect the statistical 
advantages of the ANOVA model Of note are the lower levels of expression of apolippprotein 
Aland an analog of apolipoprotein B in the transgenic animals, while an analog of 

10 apolipopr otein F w as higher. Interestii^y^ i^^^ 

Leiden mice ; reyealed an approximately twofold down regulation at the protein level. In 
addition, peroxisomal prdliferator-activated receptor-alpha (PP ARa) expression was not 
difiFerent between the two populations, while liver fatty acid binding protein (L-FABP) was 43 % 
H^er in the transgenics. PPARa plays a key role in initiating gene expression of proteins 

15 involved in lipid metabolism, while experimental evidence suggests that L-FABP may control 
the activity of the Iraiiscriptibn fector by controlling the rate of presentation of activating ligand. 
The lipid profiling analysis shows that lipid metabolism isondeed impacted by the presence of 
the iransgene, and in the absence of change in PPARa levels, these data support a regulatory role 
forL-EABP. 



WO 2005/020125 



34 



PCT/US2004/027022 



Table Liver inKNA expression. 



Description 



claudin'4 
GD8beta opposite strand; 
kdquois<rel^ted homeobox 3 (Drosophila) 
QySteine net protein 
ApoUpdprotein;A-I 

fatty ^idbiiding Jprbteih 5, ^idennal . 

ESTs^M^ IS6333 apolipoproteinB -rat 

plexin 6; 

nitric oxide 'syntiiase 3j endolhelM 
ornithine aminotransferase 
gJutaMoM (Yaj 

mda^^ , ... ^ 

X^53 anfigen 

ESTs, Weakly simUktoap^ pLsajiiehs] 
receptor (c^^ : 
<c^dir^^ 
eos^op^ 

^(feme,c;03& siibunitVila^; . " - 
lxistidine^ 

maikte defiydrogemse^sbluble 
Mmiisculus; H2B • gene 

Al^ase^ ^ (v^olar projoii piimp) 

Air^iili^ ; 

gangUosidcs-i^^ 

sdlute c^er faMiy 35 (XIDP-galactbise ixansporter), member 2 

j^bserT^^ated p^tean, 58^a 

spennidine/spenii^ 

fatty acid binding protein 1 , liver 

signal recognition particle 9 fcDa 

brbsbmucpid2 

cathepsinS 



Ratio; p-valiie 



nuciebbindin2 
orosomucbid 1 
serum amyloid A3 
major urina^ protein 1 

DnaJ (Hsp40) : to C, member 3 

SEC61, gamma subunit (S. cerevisiae) 
calcium binding protein Al 1 (calgizzafin) 
tumbrVrejectiori antigen gp96 
proteoglycan, secretory granule 

heat shock 70kD protein 5 (glncose-regulated protein, 78fcEft 



0.59 


0.001 


0.69 


0.003 


0.72 


0.001 


0.74 


0,00(5 


0,75 


0.009 


0.75 


0044* 


0:7$ 


0 ; 043 : 


0.77 


0.019 


0,81 


0.0i8 


1.22 


0.016 


mz 

1.28 


0.029 

0 002: 

0.027 : 


1.28 
L28 

m 


0.037 
0.028; 
0.032 


1.29 
131 
1.32 " 


0:040 
: 0,013 
0 044 


1.33 
1.33 
1.34 


v«\TTT 

0.031 

0;023 

0:021 


1.34 
1 39 
1 .40 


0.048 
0 01 8 


L4tf 
1.42: 


0012 
0.024 


L43 
1 43 


0 021 
O 030 


1.43 
1.45 


0 094 
0 034 


1M 
1 48 


0;020 

0 033 


X • « w 
1 AQ 


A (\(Y7 

V.W7* 


130 


0.015 


1.51 


0.009 


1.51 


0:001 


1.56 


0.005 


1.58 


0.012 


1.60 


0.008 


1.70 


0.004 


2.01 


0.003 


2:45 


0.001 


2.93 


0.001 



WO 2005/020125 



35 



PCTYUS2004/027022 



Quantitative profiling of liver proteins. Off-line reversed phase separation of soluble liver 
proteins to decrease the sample complexity by approximately afector of 20 was initially 
employed. An ESI-LC configuration was coupled to the mass spectrometer that was capable of 
handling hundreds of consecutive injections. Next data was acquired using an MS-only scan 
5 cycle, without acquisition of sequencing MS/MS scans, To reduce cycle time and minimize the 
loss of information that occurs while the column elutes between scans. 
As shown in Kgme 31 A* LC/MS chrdmatograms were acquired for digested liver protein 
fractions from five APOE*3-Leiden and five wild type mice. The IMPRESS algorithm was then 
applied to each data set to extract peak intensity and signal quality information. Aji IMPRESS 

10 quality value of 0.5 was selected as the threshold below wMcfepobr quality signal data woutd be 
excluded from fiirfher analysis. Clustering was then performed using flie principal component- 
dispnminant a^ysis (TPCD*A5 tool built into the W1NLIN software. As shown in Figure 3 IB, 
two di^ct clu^ers were observed with transgenic micein one and wUd type mice in the other. 
An inspection of the factor spectrum, illustrated in Figure 31C, provided masses of the ions that 

15 differentiated the two clusters. At-test was appliedto each of the ^differentiating ions to test 
significance, and ah LC/MS/MS spectrum was acquired for each peptide. Six tryptic peptides 
that were each derived from a digestion of L-FABP, withmass to charge ratios 446, 599, 706, 
892, 895, and 1058, are labeled in Figure 3 1 C. Since the fector spectrum is semi-quantitative in 
nature, pe^ inten^ty kfoimatibn gathered by IMPRESS was used to calculate relative 

20 differences. The results of this profiling analysis indicated that L-FABP was uprregulated by 
44% in transgenic mice relative to wild type controls. This was essentially a one-to-one 
correlation with themRNA expression observation noted above. Table in summarizes the 
results of the protein analysis. 



WO 2005/020125 



PCT/US2004/027022 



Table ECL Liver protein expression. 



Ratio p- 

Desciiptidn, TG/WT 


Apoptoisis protein MA-3 
^aloglycopiptein keceptor 1 

AW-fiitfdingXjassetfe, Sub-Family D~(ALD), Member 1 

E^JNlouse^ 
Kinase Kgand: 3) 
liverrFaffy AcidBindihg Protein 


0.85: 
1.39 
6.72 
0,76 

. 0.52 
1.44 


0,019 
0.028 
0.025 
0.016 

0.005 
0.036 


Forkhead;^ 
Glutatbione-S-Transferase 

Guahme Nucleotide Exchange Farf6rEit2B Delta Chain; 
LongrForm , ■'. /, ;:. " " Tr " ■', 
*Guj^e]^ 


• 1.24 
0,69 

~ 1.36 


6.008 
0.015 

0.002 


.Heme,Oxygenase^2;H0-2> 
Hemopoietic; CeU?nd 
Homeotic ^ ^Precursor 


1.59 
0.64 
1.38 
0&8 


01014 
6.014 
0.020 
0.060 


LithiumJSjensitive;M^ 

RfflerCeUtLectin^ 

LjropMc^e ^tigeaa 78 - 

Md6Prpteih •, 

Mouse Fat -1 Cadherin 


o.ei 

1.98 
0.53 
0.67 
085 


0 034 
0J019 
0.007 
0 035 
0.019 


NodajMouse Nodal Precursor 
Numb-Binding Protein £nxp80 
Probable E1-E2 Atpase - Mouse (Fragment) 
Procollagen, Type V, Alpha 2 
Protein KmaseC.Epsilpn 


0.52 
1.12 
. 0.83. 
1.20 
6.74 


0.006 
0.032 
0.034 
0.012 


Pynwate Kinase 
ireiqui^Protein Ijgase E3a 


1.24 
1.15 


0.105: 
6.008 
0.079 



Quantitative profiling dflh'er HpidsX Lipids were profiled using ^strategy similar to that used 
for the protein analysis. Duplicate datasets were acquired for each animal. The extraction 
protocol and LC system was designed to fractionate larger, non-polar lipids such as 

5 diacylglycerols (DG) and triacylglycerbls (TG). Captured within this acquisition were also 
quantitative profiles of phospliaudyicholihe (PC) and lysophpsphatydylcholine (LysoPC) lipids. 
Followuig d^aprefprocessing with IMPRESS to obtain peak information, PCDA clustering 
analysis was performed using WMIM. As shown in Figure 32A,,me two populations of mice 
formed two distinct clusters, The PCDA factor spectrum, iUustrated in Figure 32B, inmcates that 

10 a number of lipids coritnTju^^ Mass to charge 

ratio ranges mat include the majority of lysophosphatidylchdlines (LysoPC), diacylglycerols 
(DG); phosphatidylcholines (PC), and triacylglycerols (TQ) are indicated. 



WO 2005/020125 



37 



PCT/US2004/027022 



As summarized in Table IV, a number of triacylgiycerols were Mgher in the transgenic 
mice, while none were found to be in lower abundance. Similarly, two 
iysophosphatidylcholines, I-palnritoyl-2-hy<fr^^ CI 6:0) 

and l-Stearoyl-2-Hydr^^ CI 8:0), were found at higher 

levels in the APOE*3-Leiden mice, whilefhere were no significant differences observed for 
other LysoPCs. Interestingly, among the diacylgiycerol and phosphatidlycholine sub-classes, an 
overall trend toward higher abimdance in the transgenic animals was not observed, suggesting 
that the disruption of lipid metabolism imposed by insertion of the transgene leads to a complex, 
multifactoral change in the regulation oflipid levels. 



WO 2005/020125^ 



38 



PCT/US2004/027022 



Table lV; Liver lipid: Fold . difference/between APOE*3 -Leiden tr^genic mice and the 
wDdtype : c6%oi?mce; 



Lysbpbospliatidylchplihe' 



Diacylglycerol 



Phq^^dylcEolinfe; ^ 



Ti^yigiycerol 



*W AAA 1 Afl 

opecies 


Ratio 

iCj/W I 


p-value 


C16.0 


1.31 


0:0190 


G18:6 


1,24 


0.0241: 


Ci8-C20:i 
G22;2ib:i 
G22i22:10 
€22£2:3 

G1.8il8:0; 
€20;18:2 
€20,20:8 


1.43. 
0.78 
0:80 
\ 0.77 

■ ■ \: ■ 

0i79 
tiJT 


0.0064 
0.0151. 
0.0018. 

-0JD092 ; 

6.0231 

0.0422 


C20,20:7 

€20,20:4 

€20,22:3 

€20,22:2 

d20^2¥ • 

€22>22:4 

€22£2:3 


0:82: 
1,50 > 
.. : 2;75 : 

1;85 " - 

1.20 

2v82 
: :1.84 


0:0341 
0.0138 

o.oooi 

0.0023 
0:0059 

o:66o5 
0:0002 - 


€22,2^:2 : - 




9.4E-06 


€50:0 


2.02 


9.7E-07 


€56:7 


1.87 




€56:6' 


1.96 


2.8E408 


€56:5 
€56:4 


1.60 
1.97 


0.0003 

o-ooob: 


€56:3 
€56:2 
C58:10 


1.84 
2.15 
5.38 


0:0058 
0:0069 

o:boo4 


C58:9 
C58:8 
€58:7 


2.94 
2.43 ■ 


&05E-06 
1.4.3E-07 


1.93 


6:78E-10 


€58:6 


2.42 


140E-09 


€58:4 


• 2.70 


1-62E-05 


C58:3 


2.15 


0:0001 


€58:2 


1.37 


0.0077 



Discussion. As highlighted ^Figures 33A^33G, ttie c^prebe^ve systems ahaly^ 
based on diff^htiai genomic, proted2mc,a ^ a number of novel 

observations that distinguish the APOE*3-Leid^^ti^gepic:mouse from wild type controls 
under conditions where the micedisplay essentially no clinical indications of disease. Following 
PCDA clustering analysis and identfficatibn of differentiating factors, the relative abundance of 



WO 2005/020125 



3? 



PCT/US2004/027022 



each biomolecmlarcom protein, and lipid, was calculated and is shown in 

Hgiires 33A, 33B, and33G, respectively. Values represent the mean± SEMfor n = 4-5 separate 
animals (* p < 0.05). Taken individually, each of these entities may serve as a biomarker of an 
altered metabolic state that predisposes a subject to hyperlipidemia and atherosclerosis. 

5 Key species in aflierosclerosis identified as eariy markers of disease m^ 

Leiden mouse are illustrated in Figure 34, Inhuman^ gives rise 

to a dysfunctional apolipoprotein E variant that is has reduced affinity for the low-density 
lipoprotein receptor (LDLR). Similarly* AP^ also develop 

hyperlipidemia andare susceptible to diet-induced atherosclerosis, Early markers of pathology 

10 that were found v^ystems biology in young mice that were reared on a normal chow diel arc 
in^cated with arrows (toward pointing denotes up-regiilation in the transgenic, wMle downward 
pointing denotes do^-regidation in the transgenic). These markers include Apo AI aridL- 
FABP mKMA and protein, and a variety of lipid molecules. For example^ Kpoprotein-assocnated 
phospholipase A2 (which is also described as platelet activating ^or toe^tiy&ol^) is an 

15 enzyme that catalyzes the generation of LysoPC from PC in circulation and has been identified 
as a risk factor for heart disease. [Packard a/., N. EngL X Med. 343 1148 (2000).] LysoPC 
contributes to early pro-inflammatory events that contribute to pathogenesis, where they increase 
monocyte adhesion and chemotaxis during fatty streak development, hi the present study* two 
LysoPC compounds that are elevated in the livers of APOE*3~Leiden transgenic mice were 

20 identified, suggesting that early inflammatory events in the liver may play a role in the 
pathogenesis of atherosclerosis. 

The apolipoproteins and L-FABP constitute a second inacromolecular group of 
biomarkers. Apolipoprotein AI (ApoAI) is significantly lower in the plasma of APOE*3-Leiden 
mice compared to wild type controls. Here, mKNA transcripts for this apolipoprotein were 

25 found to be lower in the liver, bolstering the previous observation and therefore supporting a role 
for lowered ApoAI and HDL levels as contributing factors to predisposition to disease. 

Evidence for elevated L-FABP wasMso provided by both genomic and proteoinic 
analyses. ApoE-deficient mice that were also deficient for adipocyte fatty add binding protein, 
aP2, were protected against atherosclerosis via arnechanism involving impaired macrophage; 

3 0 function. [Makowski et al 9 Nat Med. 7, 699 (2001).] L-FABP is member ofthe same family of 
intracellular fatty acid binding proteins. It is believed to play a role in transcriptional regulation 
by acting as a shuttle for ligands of PPARet. [Wolfium efa/.,Proc. Natl. Acad. ScL TJSA98, 
2323 (2001).] In humans, ApoAI expression is transcriptionally regulated by PPARa. Of 



\VO:2005/020125 



40 



PCT/U$2004/027022 



p^c^ar^intefest, the results of the present study show an iinco\iplirig of the relationship 
between^ 

^A^lwels^ 
^gg^thatan^ 

^conclusion, we have sho^^ 

pre&sppi^^^ Taken, 
v coU«tively^coUectipnS:0^ 

■ ea^ed 
JSas|«^ 
iht^ehtipriv 

'tM/- V- * . transgenic mouse model } :' V; : !T . // " r . ? " 

A platform 

underi)^ 
20 a&$}$e^idi^ 

m^olitevco^^ ••, 

results coiifiinxkiwHra metabolism pi^s 

iipSprot^ 

ae -b vei^ approach to systems ^ysis, a: whole plasma pai^d 
25 proflikg scheme, ^Ued in this study is schematically outlined inFigure35. Whole plasma^ 
lipid, and protein fractions from Ap6E*3-Leiden and confrqbmice were analyzed by NMR and 
MS. Both metabolic ^d protein data s^were 

clustered simultoeously uMng ; TO ihetext Separation 

ai$ ^ such as;HPLC, NMR and LG/MS, were wmbined with 

.* 30 ppweiM stati^^ discntiiinani ^ analysis, Ao rabidly 
cluster and identify Biochemical constituents an plasma of control vs. genetically -perturbed 
animals: pie results show major (> 2^fold) and I^s db>dous;but statistically significant (p< 
0.05 t-test) differences at the protein and metabolite levels. 



WO 2005/020125 PCT/US2004/027022 

41 

Animals. APOE*3-Leiden transgenic mouse strains were generated by microinjecting a 
twenty-seven Mlobase genomic DNA construct containing the human APOE*3-Leiden gene, the 
APOC1 gene, and a regulatory element termed the hepatic control region that resides between 
APOC1 and APOE*3 into male pronuclei of fertilized mouse eggs. The source of eggs was 
5 superovulated (C57B1/6J x CBA/J) Fl females. Transgenic founder mice were farther bred with 
C57B1/6J mice to establish transgenic strains. Transgenic and non-transgenic littennates of F21- 
F22 generations were used in these ^periments. All mice were fed a normal chow diet (SRM- 
A, Hope Farms, Woerden, The Netherbhds) and sacrificed at nine weeks, at which time jplasma 
tissue samples were taken and froze®, in liquid nitrogen. The samples fix>m each individual were 
10 then su^diyidedjor separate protejnjmdm^ — *■ 

Plasma lipoprotein profiling. Plasma from 9-week old mice that were kept on regular 
chow diet (SUM-A, Hope Farms, Woerden, The Netherlands) was fractionated by size exclusion 
chromatography through a Super SW3000 TSKgel coluinn (Tosoh Biosep, Tokyo) on an LC 
Packmgs chromatography system (Dioiaex, Marlton, NJ). Total protein concentration for each 

15 sample was determined by the Bradford assay and 10 pL of y&ole plasma normalized to the 

lowest concentration was injected and eluted isocraticly in 20 mM Bis-Tris Propane, pH 6.9; 1 00 
mM NaCl at 50 jilimiriute. Base-resolved peaks corresponding to molecular weight ranges of 
greater than 300 kD were collected as discrete fractions. Proteins were digested, thermally 
denatured and reduced in 1 00 mM ammonium bicarbonate, 5 mM calcium chloride and 10 mM 

20 dithiothreitol at 75°C for 30 minutes, alkylated with 25 mM iodoacetamide at 75 t> C for 30 
minutes, and then digested with 0.3% (w/w trypsin/protein) for 24 hours at 37 9 C. 

Protein LCJMS analysis. Liquid ctoomatography-mass spectrometry (LC/MS) was 
performed using an LCQ DecaXP (ThermoFiimigan, San Jose, CA) quadrupole ion trap mass 
spectrometer system equipped with an electrospray ionization probe. The LC component 

25 consisted of a Surveyor autosampler and quaternary gradientpump (TheimoFinnigan, San Jose, 
CA). Samples were suspended in mobile phase and eluted through a Vydac low-TFA CI 8 
column (150 * J mm, 5 |im) (GraceVydae, Hesperia, CA). The column was eluted at 50 
pL/minute isocraticly for two minutes with Solvent A (water/acetpnitrile/acetic 
acid/tiifluoroacetic acid, 95:4,95:0.04:0.01, vol/vol/vol/vol) followed by a linear gradient over 

30 43 minutes to 75% Solvent B (water/acetonitrile/acetic add/tffluoroacetic acid, 

20:79.95:0,04:0.01, vol/vol/vol/vol). The electrospray ionization voltage was set to 4.25 kV and 
the heated transfer capillary to 200°C. Nitrogen sheath and auxiliary gas settings were 25 and 3 



WO 2005/020125 



42 



PCT/DS2004/027022 



linit^respecUyely. For quantification qftryptic peptides, the scan cycle cohsistedof a single full 

scan'iiftss ^ectnm acquired over ^ Data-dependent 

product abnm^s^ 

TurbbSEQl^ 

Swss^dtarid^MSDBd 

Metabolite analysis. The mouse pla^^ global lipid and 

m^^Kt^ 

centiif^ A 500 h& ^ was; - 

To prepare 

; ^sppl^^ ofwaterv^^ 

u^g^ An exponential winddwfi^^ of : 

0i5:5^^aSEd^q^ A&rreferangtothe -CD3 

IgM^f CE>3pD {^3.3Xp> line listings were ^ 

abort ^ 
statistic^a^y^ 

plasm^lipid and m^olite 

717>series aposampler and a 600 series single gradient-foiining pum Corporation 
j^oi^ ^niples wereiiy^ 

nmi) protected by an R2 guard cbluiim (Ghfom^ck); A 75 ML,aUquotof:mouse plasma extract 
was iijjec^ order- The random sequent ^ appUed to prevent detrimental 

lefibcts of possible drift during ana^ statistics. The 

elution gradient was formed by using three mobUe phases: (1) (wafe^ 
acetate (lM/Eyfonnic acid, 93.9:5:1:0.1, vol/vol/vol/vbl), (2) (acetonitrile/isopropanol/ 



WO 2005/020125 



43 



PCT/US2004/027022 



ammonium acetate, (lMC)/f6naic acid, 68.9:30:1:01, vol/vol/vol/voi), (3) 
(isopxopanol/dcldoromethane/ammbnium acetate (lM/L)/fonnic acid, 48.9:50:1:0.1, 
vol/vol/vol/voi). The samples were fractionated at 0.7 mL/minute by a four-step gradient: (1) 
over 15 minutes going from 30% to 95% buffer B; (2) 20 minute gradient from 95% to 35% B 
5 and 60% C witha 5 minute hold at this step; (3) rapid one minute gradient of 35% B and 60% C 
going to 95 and 0% respectively; and (4) 95% buffer B going back to 30% over 5 minute period. 

The electrospray ionization voltage was set to 4.0 kV and the heated transfer capiUary to 
250°C. Nitrogen sheath and auxiliary gas settings were 70 and 15 units, respectively. For 
quantification of metabolites, the scan cycle consisted of a single full scan (1 s/scan) mass 
10 spectrum acquired oyer m/z 200^700 in the positive ion mode. . — 

Data pre-processing NMR. The NMR spectra were aligned manually with WINLIN 
statistical software package (TNO Pnarma, Zeist, The Netherlands). 

Data pre-processing LC/MS. The BC/MS data files were converted to NetCDF format 
using Xcalibur software (ThermoFinnigan). The converted files were evaluated with IMPRESS 

15 post acquisition noise reduction and nonnahzation software (TNO Phanna, Zeist, The 
Netherlands) to obtain a fingerprint spectrum for each of the LC/MS files. The program 
evaluates each mass trace for its chromatographic qualify by assessing its information content 
This is performed, after smoothing to remove spikes and by calculating for each mass the 
entropy of the trace according to Equation 12. Taking tire reciprocal value of H and scaling all 

20 results to the largest value gives each mass trace a scaled chromatographic quality, or IQ. 

PCA and PC-DA analysis. Principal component (PCA) and discriminant analysis (PC- 
DA) were applied to the fingerprint spectra of the aligned plasma NMR spectra and IMPRESS 
preprocessed LC/MS spectra. This was done using WINLIN statistical software (TNO Phanna, 
Zeist, The Netherlands). 

25 Differential metabolic NMR analysis. To evaluate the pattern recognition and 

clustering methods for metabolite analysis, a dual approach was used, where NMR was utilized 
as the initial screening method followed by LC/MS, which has been established as a benchmark 
analytical method for metabolome profiling in a variety of biological systems. [Raamsdonk et 
al., Nature Biotech. 19, 45 (2001).; Nicholson et al Xenobiotica 29, 1181 (1999); Fien etdL, 

10 Anal. Chem 72, 3573 (2000).] To facilitate NMR data processing, the WINLIN software 
package was applied to cluster and estimate 1he degree of variance between the wild type and 



WO 2005/020125 



44 



PCT/OS2004/027022 



transgenic data sets. Sufficient differences,based on toe preliminary NMR screen,.have 
emerged to warrant further detailed analysis using MS and MS/MS. . 

Whole plasma samples from20 mice (n=10 for each group) were used for global 
metabolite NMR analysis. For a typical 400 MHz ? H NMR, 750 uL of deproteinated sample in 
MeOD were used to generate triplicate spectra, which are illustrated in Figure 36, for both the 
wildtype mouse plasma sample (WT) and the Leiden mouse plasma sample (TG). After 
refs^g to the -CHs signal ofMeOD (5 = 330), Ime lisungs Were prepared using the standard 
Yarian NMR sofrware. To obtain these listings, afrresonances in the spectra above- a threshold 
conespondhigtoibbMptee um« 
10. _ data file format suitable forjtatisfical anatysis-applications. The mtent for usingNMR ' — 
fmgerprintmgforMtialah^ of plasma metabohte components was not to assign signals to 
specific compounds, biit toestablish whether the samples exhibit sufficient clustering and thus 
warrant a more detaUed analysis. Close examination of me^JMR data revealed smaU variations 
m me resonance position of comparable lines. Variations in the positions tif lines are due to the 
15 relatiye/con^ntotipnof me «>mp6 ^ ^ 

the temperature ar«l the homogeneity of the magnetic field, which were corrected for manually. 
Spectra processed in this manner were imported into the WmiN statistical analysis tool for 
discrimm^t component analysis (PC-DA) dustering. 

Figure 37iiUustrates a PC-DA scoreplot showing clustering of NMR d^fbr;the Leiden 
10 mouse, represented by tiiangles, and me controlmouse, re^ WINLIN allows 

graphical clustering of results after the data are normalized and subjected to principal component 
analysis (?CA); JBach point wmih the cluster is spatiaUy positioned to represent one of the 
triphcate sets of the preprocessed spectra. Concentration intensities from each of the tripficate 
spectra were used to construct the PC-DA cluster sets. The first step in principal component 
analysis is the extraction of eigenvectors from the varianceycovariahce matrix to obtain a number 
of orthogonal sets of new variables; called principal components, that are optimized in their 
ability to explain a maximum amount of variance in the original data. In higbly correlated data, 
a few ofthe top rmuong principal components will be sufficients reproduce the significant 
variance in the original data set. PCA was appUed to reduce thenumber of features needed to 
0 investigate the i partial linear fit (PLF) aligned NMR spectra of the control and APOE*3 Leiden 
mice. Projections of the samples ontothe first fifteen principal component axes were men used 
as starting point for ' lme^ discruBmant analysis. 



TVO 2005/020125 



45 



PCT/US2004/027022 



Factor spectra were used to correlate the position of cloisters in this score plots to the 
original features in the spectra by a graphical rotation of the loading vectors. [Windig e?/ al., 
Anal. Chem. 56, 2297 (1984) J The dffiexenoe factor spectrum plot, shovm in Figure 38, is 
characterized by a number of Misrepresenting various metabolic components defined by a 
5 range of contribution toors, specifically, ion m/z's that fecihtated clustering of t^ 
control mouse populations. The hd^t of the lines above and below the axis of the plot is 
directly related to the amplitude of the contribution to the overall variance where the factors 
extending below the axis correspond to higher spectral intensities in the transgenic animals. 
Since PC-DA separates clusters in a isingle unique direction, lines projecting below the central 

10 axi s represent NMRspectod pattern components of higher intensity in thepbsma of transgefnc" ' 
mice. The lines extending above the cenM ^s symbolize ^ at higher absolute 

concentrations relative to the control\group. 

Factor spectra prepared in directions of maximum separation of the two categories were: 
used to give M insight into the type of m etabolites responsible for the separation of the observed 

15 categories. Preliminary results based on the PG-jDA loading plots point to the 53.8 ppm- 64.2 
ppm region and the lipid region (5 1 J ppm- 8 0.8 fipm) as the primaiy contributors to 
quantitative variance between Leiden and control samples, 

The limitations of NMR spectroscopy result from the low inherent sensitivity of the 
technique and from the high complexity and information content of NMR spectra. The 

20 sensitivity of the technique is also affected by the minimum threshold concentrations of 

compounds being detected. Regardless oftts limitation^ it is clear that NMR based metabolome 
profiling coupled to pattern recognition technology is a powerful analytical approach for 
integration of metabolic data into a comprehensive systems-level analysis. In this study 
however, the purpose of the NMR screen was not to identify specific molecules^ but rather to use 

25 the method to determine whether a qualitative degree of differentiation between sample 
populations exists. 

Simultaneous analysis of metabolic and protein components yields expected and 
novel patterns. Metabolite extracts from plasma of transgenic (n=4) and control (n=4) mice 
were prepared by the isopmpanol precipitation method. Upon addition of 400 pL of water to 
30 100 jxL of extract, the samples were subjected to LC/MS analysis, Figure 39 depicts ITCs that 
wore collected using single scan mode over the 400-1700 m/z mass range. To apply statistical 
statistics to the LC/MS spectra, the raw data files were first converted to NetCDF format and 



WO 2005/020125 



46 



PCTAJS2004/027022 



5 




processed using IMPRESS noise reduction and normalization software. The program evaluates 
each mass trace for its chromatographic quality by assessing its informatioh content This is 
performed, after ^potBing to remove spikes and by calculating the entropy for each m/z of the 
trace according to Equation .12. Mass intensities normalized%IMPRESS are assigned a st^ed 
c quality number, or the IQ. To perform principal component analysis, the IQ 
ms in Figure 39 were imported into WINLIN, and disramiinant analysis 
separation was obtained based on two initial prmcipal component ve<tors 

The proteonuc whole plasirm analysis was biased toward> ftactions containing 
Kpoprotein complexes. This was in line with expectations that most ,statistt(^y relevant 

^^ch^ges associated with the Leiden mutatioh,wffl-OGcur in this class of proteins, based on me 

transgemc model selected. Whole plasma samples from toe transgenic (n=^4) and control (n=4) 
amma^ were fractionated by analytical size exclusion chrom^^ 
corre^ondmg to^gh molecular w^ 

e?§eiimen&l pro;focpl Two noajor ear^ peaks ehrted at 23 nmufesand 27nunutes, . 
coi^onding to VLPD (fraction 1) and HDl (fraction 2) components of whole plasma, 
■ respectively, were used.for all subsequent mahipulations. Proteins cohtained in fractions 1. and 2, 
were treated with trypsin to generate proteolytic peptides. 

Tiesvpf the VTJDL fractions^om ^MS analysis ate shbwiiim Figure 40 %the 
• wild1ypemouse(WT)andthe;^ MS/MS spectrac6nectedfor ; all eight 

20 fepresehtative samples were analyzed by TurboSEQUEST to generafe Mts against NCBI 
nonredmdant,:humah and mouse databases. The identities of these initial Hts were further 
verified using the MASCOT de novo sequencing and database search tooh The threshold for 
assigning protein- identities was based on the minimal sequence coverage set at 20% of total 
residue count. The protein MS data were clustered in a way shnilar to the metabolic component 
25 by generatinglQ value spectra followed by discriminant analysis. 

To observe, quantitative relationships between metabolic and protein components of 
plasma, an assembly of concatenated heterogeneous data sets - was used. Original individual data 
sets were integrated separately and IMPRESS quality m/z values from these sets were summed 
and subjected to the statistical clusteimg analysis. The resulting score plot, which is iUustrated 
50 in Figure 41; shows PC-DA clusters for me wild type (WT) and transgenic (TG) animals 
generated based On two principal components rotated to achieve, maximum separation in DI. 
Each point represents linear combination of metabolite and protein variance factors (60% of 
original data set) fox the individual animals. 



WO 2005/020125 



47 



PCT/US2004/027022 



Filtered m/z intensities from metabolite and peptide spectra were organized in a linear 
fashion in the factor plot, shown in Figure 42. Linear distribution along the central axis 
represents protein and metabolite components with calculated bi-directional contributions to 
variance between the control and transgenic groups. Main positively contributing factors are 
5 seen projecting above &enbntiiM cut-off weight of 50. Negative contributors to the overall 
variance project below the -50 set boundary. 

By adding nominal values of 1601 and 3401 to each m/z value in the second protein and 
the metabolic components, respectively, heterogeneous experimental data was analyzed in 
parallel, as shown in Figure 42. Significant contribution intensities were scored based on the 

10 factor plot specific threshold parameter, which was.set to-50.in4his instance. The masses ihar~ * " ' 
were found to be major differientiators betweenthe WT and TG data sets wore extracted and 
identified by LC/MS/MS, The combination intensities (raw data and IQ scores) of 
differentiating factors ware measured directly in the LC/MS -(^3x0^^003 for statistical 
significance (^O.OS) and calculation^ fold change. 

15 The results point fo a composite profile that corroborates previous findings with respect 

to lipoprotein and lipid abnormalities associated with the APOE*Leiden phenotype. 
[Mensenkamp et dl, J. Hepat 33, 189 (2000); van den Maagdenberg ar/., I BioL Ghem. 268, 
10540 (1993); Williams van Dijk et at, Arterioscler. Thromb. Vase. Biol. 19, 2945 (1999); and 
Mensenkamp et cd. 9 L Biol. Chem. 274^ 35711 (1999).] Specifically, at the protein level we 

20 were able to show that human APOE*3Leideh allelic variant is expressed and functionally active 
in the transgenic animals as evidenced by its incorporation into VLDL (protein component 1 in 
Figure 42) and LDL/HDL (protein component 2 in Figure 42) fractions of plasma derived 
lipoproteins. Alternatively, murine ApoAl has been found to be twofold less abundant in the 
plasma of transgenic mice indicating lower degree of incorporation of the apolipoprdtein into the 

25 LDL/ HDL complexes in these animals. 

Although the underlying processes governing HDL metabolism have not been fully 
defined, HDL levels in plasma have been shown to have inverse relationship with atherosclerosis 
susceptibility. [Callow et al, Genome Res. 10, 2022 (2000); and Glass and Witztum.] A 
number of different mechamsms . can. control HDL plasma. Most prominent factors identified in 

30 mouse models that contribute to lowering plasma HDL include defects in apoAl, apoE, 

phospholipid transfer protein (PLTP) and the overexpression of cholesteryl ester transfer protein 
(CETP) or scavenger receptor SRB. [Callow e/ a/.; Williamson etaL 9 ¥toc. Natl; Acad. Sci. 
USA 89, 7134 (1992); and Wang etaL, J. Biol. Chem. 273, 32920 (1998).] Assuming that the 



WO 2005/020125 



48 



PCT/US2004/027022 



Leiden mptatipn is functionally analogous to a defective APQE allele, it is tighly -likely that, in 
the context df the Leiden model, the lower HDL levels are at leastipartially the result of the 
ApoE* 3 transgene function. One possibility for decrease in total endbgimous ApoAl is the 
steic^ 

recrmtment for I^L/HDL assembly.; 

This study demoj^tr^fcs the utility of a multilevel approach for characterizatibn of a 
&gkty cbniplex'^stem. By g^^ting W^ content comparing integrated, 

p^ciple cbn^n^^ctbrs deriv of identities and . 

ihereia^ %opfotem met^c^ 

pte^t^ — — 

metabolpme data to ^lai^disease. Jto the futo^, it v ^ by 
ihdudinglh^g^ 

tissue ^d^ effects dfrgene: 

.pertiirbatioxis. 

Examvle S. Systems biolozv (wvroach: Metabolic^Disease Study 

Summary; The overall goal oftiiis example is to idemori^ate ^ and 
data integration capabilities a^rdingvfc;the;inveM The general areaof medical interest was 
meteb^ the matestiak 

(rodeirf Md npn^hum^ primate) and'Srom human subjects. A subset of each g^up of rodents 
(<Ks<&sed aiukcoi^^ b^ the project (Phase T), the 

testor was awafe th^ there were three sample sources (rodent, non-human primate* and human) 
butwasblinifed^ 

The specific objectives of the study were as follows. 

Phase I 

■ ! to iimdertake metabolite, and protein analyses' of blinded serum samples&brn animal and 
human subjects; and 

■ , to grpiqp th^fsamples based on the; serum;metabolite and protein profiles. 



WO 2005/020125 PCT/US2004/027022 

49 

Phase H 

■ after rahlinding, to compare the groiiping of the samples as determined with the actual 
sample groups; 

■ to define, for each of the sample types, molecular components (biomarkers) #at can be 
5 used to differentiate one group of samples from another; 

* to construct correlation networks for ^ gain insight into the 

biochemical processes underlying the disease or drug treated phenotypes; and 

■ to determine whether molecular components which differentiate diseased rodents from 
control rodents are similar to those which differentiate diseased human patients from 

-tO — control human subjects. 

Blinded analyses of the metabolite and protein profiles Bar the^serxun samples revealed 
foiur cl^y fi^ct ^o^s that, upon unblinding, corresponded exactly to the actual groups of 
samples (Diseased + veWcle, Diseased + drag, Control -t- vehicle, Control + drug). Blinded 
analyses of the profiles for the non-human primate samples revealed two distinct groups that, 

15 upon unblinrimg, corresponded exactly to the diseased and control groups. For the human 
samples, blinded analyses of the metabolite and protein profiles revealed different numbers of 
groups (4 or 2), depending upon the an^ytical platform employed Analysis based only on lipid 
profiles revealed two group^ that, upon unblinding, corresponded with $6% accuracy to the 
diseased patients and with 89% accuracy to the control subjects. 

20 A large number of metabolites and proteins were identified that differentiated between 

the groups of animal and human serum samples. The relative levels of these biomaikers in the 
samples provided insight into the biochemical processes underlying the disease or drug response. 
One of the notable findings was the effect in the diseased rodents of the drag treatment on serum 
protein levels. A second, distinct finding; was the almost identical widespread changes in the 

25 levels of over 150 serum lipids in both the diseased rodents and the diseased patients relative to 
thelevels in the corresponding control subjects. As a validation of the rodent model as a model 
of the human disease, the testor was also able to use fixe set of serum lipid biqmarkers found to 
correctly classify diseased control rodents to distinguish with good precision the diseased 
patients from the control human subjects. 

JO Introduction. The overall goal of this example was to provide a basis to assess 

integrated platforms of proteomics, metabdlomics and Mormatics technologies as applied to 
comparative studies of pre-clinical and clinical serum samples. Serum samples were provided 



WO 2005/020125: 



50 



PCT/OS2004/027022 



from a drug^treaiment study in a rodent model of metabolic disease, a eomparatiye study of 
metabolic dkeasemhuma of a rdaied cohdition in non-human primates, 

7!^j^^mB^h&^ into two phases; In Ph^e I, the testbr was blinded with respect: to 
saiaple information andpeiformed comparative quantitative proMng of metabolites and prpteius 
using a combination of NMR and MS;techniques. Ihfonnatics methocis such as unsupervised 
dustering;^ the data:tp detennine if the exptaiinerit^ be 

accurately discriiiiinated. At the condusion of fhaseX> ; the data was unblinded, and it was\ 
revedi^tii^^ 1^ 
en^has&ofthes^ 

as well as * <teteraiih^ 
one another, U 
^krodent^c^co^^ 

the Imm^diseaseiah^ Thfe^ only certain results in order 

t^xemplif^ti^^ 

S^ple informatioiL In Phase I of the stid^ festpr was blinded with respect to 
^?^r^ 

subjects, Unbhnding of flie san^le information was done prior to Phase II. The experimental • 
gro\ips ^d i^^ofs^feaie listed below. 

■A* Dmgti-eafahent sfodvm a rodei^^ . A total Vof 32 sftnim 

simples ^ study where a thearape^ administered; 

to diseased rqd^ and non^ 

n~8 control treated wi^ vehicle. 

n- 8 control treated with drug 

ri = 8 diseased treated with vehicle 

h r= 8; diseased treated w& drug 
B. Comparative study of metabolic disease in human sub jects : Atotal of 42 seromsainples 
(300-400 |iLjper sample) fromM disease and controls 

were subdivided as follows. 

m == 14 Subjects diagnosed with metabolic disease 

n = 28 Controls 

c - Disease study of hbh-hu man primates : A total of 24 serum samples (300 850 pL per 
sample)*froin non-human primates were profiled. 



WO 2005/020125 



51 



PCT/US2004/027022 



n^ 13 formal non-human primates monkeys 
n= 12 Diseasednon-human primates monkeys 
Methods utilized - Analytical profiling. The approach in the Example to differential 
prbteomics and metabolomics employs several distinct analytical methods that enable the 
5 quantitative profiling of a wide range of moleciOar components. These methods utilize either 
NMR or MS as analytical ehdpoints. Profiling platforms have been optimized taking into 
account robustness, reproducibility, sensitivity, and dynamic range and are designed to survey 
molecules that may span orders of magnitude in abundance as well as a range of biochemical 
classes. Each platform has the capacity to profile many components (hundreds to thousands) 
10 within a single andysis, and software tools/were nsed toiacilitate the extraction of quantitativer' 
infonnatibn for ^ computationM and inform analyses. Methods appUed in this 

study are listed below. 

:1. Protein LG/MS: allows profiling and identification of peptides and proteins. . . 
2. CPMG.NMR: enhanced NMR measurement of low molecular weight metabolites, 
15 3. Diffusion-edited NMR: enhanced measurement of Hpoprotein-associated 

metabolites. 

4. Lipid LC/MS: optimized for profiling of lipids and non-polar metabolites. 
Methods utilized - Data processing. The re^tantNMR spectrum or LC/MS 
chromatogram obtained fiom a profiling experiment may contain many hundreds of peaks that 

20 represent the relative abundance of hundreds of molecules. Data processing software tools are 
used to enable the extraction of this information from each data file as Well as the comparison of 
measured peak intensities across the sample set As described above, typically, data processing 
steps include peak detection and measurement of relative intensities (peak integration), an 
"alignment" stejp to compensate for minor differences in peak position that might occur from one 

25 sample analysis to another (i.e., small differences in NMR chemical shift or LCTMS retention 
time for a particular peak), and assignment of an identifier (or index number) to each peak so 
that it mi^ht be compared across samples. 

Methods utilized Data analysis. The data were analyzed using several different 
statistical approaches: (1) unsupervised clustering of samples (including COSA hierarchical 

30 clustering), (2) univariate statistics to determine peaks that are different between groups of 
samples, and (3) correlation network analysis to identify correlations between in<fividual 
components of metabolite and protein sets for all samples. In addition, some preHminary data 
analyses with a support vector machine (S VM) classifier for the purpose of classification were 



WO 2005/020125 



52 



PCT/US2004/027022 



^undertaken. Figure 43 is a schematic representation of the ^ ai^^^OTl^w. Elements of 
the dato analyas process are listed below ii the order they are performed. 
I... BakN^^ 

2. Apptiratipii;pfexpi^^ clustering methqd$: 

- WS A 

- Principal Components 

- K4#e?ms|CI^ 

- Nei^ network (hun^ s^pli^ drJy). 
3~ JP0^ : ^l^m fm identific^fon: ideternoirfe 

idmtificatioii. 

4< 'C^tt^tidnN^bTl^: defeim^ correlatipiis among paire 

$L D^^iimalizatidn: use softv^^ls to incorporate database inflation whti the = 

Results and discussion fo^^ of metabo^ 

pfoenmi*^ 

clusi^ a 

i#fistic^^ samples y^-Mifda^py^gfi of sample classification; 

I^^^to®g^^: ^ general, multiple data sets fern multiple a^ytical platfqn^. were: 

fre^mbM 

cl^™g?^3^is v In this showed 
appropriate clustering,'the data sets were concatenated and/or integrated and/br correlated.*) 
obtain an even more . robust analysis. The concatenated data was nomiali2»d and clustered, and 
the residts were recorded as a pibfile of a biological system. 

Data collected fiom all individual platfoims resulted in clustering pf blinded senml: 

dusters-formed. Clustering info four groups was observed with both the protein and lipid 
platforms. These four grot^s-tiial were ultimately identified consisted of samples 9-16, 17- 
24, and 25-32. 

The clustering of the LC^MS proteormc data Ci.^, a single analytical platform) is 
illustrated in Figure 44A. Figure 44A is an example of the COSA clustering analysis of rodent 



WO 2005/020125 



53 



PCT/US2004/O27O22 



serum proteomic LC/MS analysis, after data alignment and normalization. In this analysis, the 
2,977 peaks that appeared in at least 28/32 rodents (>87% of the samples) were used for 
clutfering. Data obtained torn the other metabolite platforms, CPMG NMR and Diffiasiori- 
edited MvIR, clustered the samples into fewer groups but the divisions were consistent with the 
5: groups fdimd during the lipid and protein analyses. 

Figure 44B shows a more robust representation of the four groups (as described above). 
Figure 44B is the result of COSA clustering applied to combined data from all platforms. 
Clustering using CPMG NMR data only revealed three clusters while using DE NMR data only 
revved two clusters (not shown). Combining data from prpteomics, lipidLC/MS, CPMG 
10 _ NMR and DE^NMR (4851 va^l^o^ The-groupings were ~ — r 

consistent with the results of the individual tt^stoents of the prpteomics data and the lipid 
profiling data. 

Unbliriding the samples revealed that groups delimited using these methods corresponded 
exactly to tbe different rodent cohorts as summarized in Table I below, 

15 Table L Sample Identification Provided After Cluster Analysis 

Sample ID Cohort 

1 - 8 diseased rodents treated with vehicle (DISveh) 
9 - 16 diseased rodents treated with drug (IJISdrug) 
17 -24 control rodents treated with vehicle (CONveh) 
25-32 control rodents treated with drug (CONdrug) 

Results and discussion for the rodent model of metabolic disease regarding analyses 
of serum samples — Metabolite and peptide peak identification. Univariate statistical 
methods were applied to the peaks profiled in Phase I to select; for subsequent identification, 
those peaks which exhibited differing abundances among the four groups of rodents. The 

20 primary statistical analysis consisted of a pairwise t-test with a significancelevel a - 0.05. The 
workflow for this ianalysis is outlined in Figure 45. In general, multiple data sets from multiple 
analytical platforms were concatenated, integrated, and correlated, and then normalized. 
Statistically different components between the disease and control groups were extracted, and the 
difference was quantified. Then, the system was perturbed by administering a drug to the 

25 diseased group, and a similar analysis was undertaken to determine the differences between the 
treated and control groups. Finally, all of the components identified were compared between the 
two experiments to obtain a profile of the biological system. 



WO.2005/020125 



54 



PCT/US2004/027022 



A representative excerpt showing differences observed among metabolites and peptides 
is shown in Figure 45A. (These componeaats may also be observed in lie correlation network 
aoodl^is (Figure 46) where they display conflations among themselves as well as with other 
identified peptides . and metabolites.) By viewing the data in this representation, one can.see, for 
5 example, that.Ievels of two i^rum proteins (^rbtisih 1 and Protein 2) were found.to be 

differentially and oppositely regulated between diseased arid coitol rodents (vehicle treated), 
andlte trafe^ diseased Protein 1 levels "to; thafof the control 

animus wMefincreasing higher than the controls.. 

Another interestingpbsera^ 

diseased + vehicle / control + veMcIe;,.;£ffect of dise^ei. 
diseased +>dnj^ diseased^ 

3- diseased + drug/ control f drug; ... ... . v ,Cpmparispn of drug-treated disease w 

tre^&a^d 

W % disie2^e|d + drugV control + vehicle. ... Gomparison.of drug-treated (hsease with 

uritr^iedcohfrol. , 

5. control + drug/ coj^ 

ihis iis the order of presentation for all analyses of the rodent senim samples throughout the 
Example for ^ 

20 Resists and discussion for the rodent model of metabolic disease regard^g M^ses 

of serum samples - Correlation network analysis. In addition to changes in component 
^imdance levels be 

components is usefid to reveal important relationships among the viarious components studied. 
Sucha correlation analysis is complement^ to abundance level information, and often provides 

25 Mormation about the biochemical processes tmderlyihg the disease or dug response. 

Figure d is a representative correlation network derived firom the proteomic, 
metaboloiriic arid clinical chemistry data in the pairwise comparison of the eight diseased drug- 
treated rodentsiand the eight diseased vehicle-treated rodents (drug effect on disease state). As 
can be seen in tiie legend, the components (or 'nodes') of the network are the various proteins, 

30 metabolites or 'clinical xhemistries measured by the various platforms. All of the nodes in this 



WO 2005/020125 



55 



PCT/US2004/027022 



figure, and in figures similar to this one^ are components whichhave: (i) been identified, and (ii) 
exhibited a fold-change greater than ±15% with p. < 0.05. 

There are a number of independent levels of information displayed in this type of 
correlatidn network First, the particular .shape of a node represents the platform that was used to 
5 measure the component For example, in Figure 46, the square shaped nodes are peptides which 
have been measured and identified (i.e., sequenced and validated) by mass spectrometry. 
Second, the shading of a given node reflects the abundance difference in the sera of the two 
groups being compared; this is ^normalized group mean difference. Third, the lines between 
pairs of nodes represent correlations in which the Pearson coefficient is between 0.80 and 1.00, 

10 <y -0.80 to -LOO. Negative correlation values are presented as light lines, while positively 

correlated components are connected visually by dark lines in 

Generally speaking, two components wbich arepositive^ a statistically 

significant mxitual behavior characterized by a change in one component being concomitantly 
related to a similar change in the second component, across all samples in the group. A trivial 

15 example jpay be pairs of peptide components from the same protein which behave similarly, or 
two NMR resonance components from^^ correlations 
may also be observed, such as between metabolites that are part of the same biosynthetic 
pathway or between entities that are components of the same macromolecular structure. Ah 
example of this type of correlation is shown in Figure 46, where the Protein 2 peptide is hi ghl y 

20 positively correlated witii a number of lipid components in the serum; this high degree of 

correlation suggests that these lipids may share the same lipoprotein origin as Protein 2 in serum. 
Negative correlations may, for example, arise between components that are part of the same 
pathway, but where they might be separated by a point of enzyme inhibition or substrate 
limitation. In addition, components that fall past committed biosynthetic branch points may 

25 show negative correlations with one another. 

The overall topology of the structure is what is referred to as self assembling and reflects 
clusters of components which are highly inter-correlated. Those nodes which are close to one 
another reflect a particularly high density of mutual correlation. The topology is generated in an 
unsupervised and automated fashion. 

30 By investigating such structures, a number of interesting observations become apparent 

For example, ft is seen that Lipid 2 is Hgher in abimdance t^on treatment ^ node is at 
approximately 4 o'clock in the largest circular structure), and furthermore it is negatively 
correlated with many other lipid components. Itshould be understood that this figure is 



WO.2005/020125 



56 



PCT/US2004/027022 



illustrative of the principles and techniques of the invention; it is one of many such correlations 
that aire possible. 

Results and discussion for the rodent model of metabolic disease regarding analyses 
of serum samples - Heat plot analysis. Ah alternate view of the correlatioEi information for the; 
conrparison of diseased^drugTtreated and diseased vehicle-tr^ed groups is shown in Figure 47. 
This 'lieat ^ plpT shows m calculated for each paring of 

idmt^ed metabolite and peptidepeaks; The color of the offrdiagonal spot for a pair of 
co^oneB^peaks corresponds to the sign of the correlation coefficient be^ 
positive Gthegatiye),,^ color intensity is proportional to the magnitude of the correlation; 
_ ?Thquigh romplKc^this vhuali2^on:emibles a rf^^nspection of the cdmjilete^ay ^jf — 
cormlationSi ^en the compos graced according to analytical method s shown in 

Figure 47^c6ixda^Qns between different component classes are apparent For example, the off- 
diagonal area tii^ilines of index ninnbere and lipids of index numbers 
110-i40.shomre^ In this case, &e 
positively correlated peptides ;(22-2ff) are fiom Protein 1 while^the lipids ^ triglycerides. Note 
thait^oldTchange i^ is not represented in Fig^e 47; the shade scale repr^ents the. 
Pearson correlation coefficient 

Results and discussion for the rodent model of metabolic disease regarding analyses 
of serum samples - Rodent protein ratios. Certain proteins play 

metabolism. It is dierefore not surprising that differences in the levels of peptics asspbiated 
v^so^ as part of this 

study. .Figure 48 illustrates the differences in four^such prdteins, Protein A (Protein lVProtein 
B 9 ^teih G and Protein D (Protem 2), represented as ratios between different ffbups. Six 
tryptic peptides were observed from Protein A, onedfrom Protein Bj one from Protein Candtwb 
from Protein D. The plot in Figure 48 shows ratios between groups;based on the means of the 
peak?intensity values vrittun each group (after noimali^tion and scaling). It is apparent that 
signified fold changes exist between the different groups. Particularly striking are the Protein 
i&rafio changes between diseased rodieaits treated with drug and diseased rodents treated with 
vehicle as well as between the diseased rodents treated with vehicle and the control subgroup of 
rodents treated with vehicle. 

Results and discussion for the metabolic syndrome study regarding analyses of 
human serum samples - Unsupervised clustering, Unsiqpervised clustering was applied to the 
human data derived using all individual platforms, protein, lipid, andNMR. As mentioned 



WO 2005/020125 PCTAJS2004/027022 

57 

above for the rodent model of metabolic disease, this alloy's grouping of samples with no 
foreknowledge of sample classification or the number of distinct groups. COSA analysis of the 
peptide data grouped the samples into four weak dusters. Clustering using the NMR Global 
metabolite data split the samples into two groups. Once the sample information was uriblindedit 
5 was apparent that these groupings did not correspond to the diseased vs, control cohorts. 

In contrast, COSA analysis of lipid data suggests two clusters (Figure 49). The COSA 
distance clustering used 779 human LC/MS lipid peaks. These clusters correspond to the 
diseased patients with 86% accuracy (12/14) and the control subjects with 89% accuracy (25/28). 
Multivariate analysis indicated that fipidsweie the strongest dfecrimin^or between diseased and 

10 control. samples.. . - — = •* — 

The lack of strong clustering in 2 out Of the 3 plaffonns indicates that clustering is 
dominated by other factors such as medications, gender, age of environment Given these weak; 
clusters derived using COS A ter some of the platforms, other clustering techniques, such as Kr 
Means and neural networks, were investigated using the same data set. These ^hniques gave 

15 results similar to COSA, with the exception of a few samples at the boundaries between groups. 
Results and discussion for the metabolic syndrome study regarding analyses of 
human serum samples -Metabolite and peptide peak identification. As was seen in the 
rodent study, potentially interesting peaks can be found by highlighting those that differ 
significantly in level between sample types. For the purpose of this study, the human samples 

20 were first divided into the two groups (14 disease patients and 28 control subjects), A two 

sample t-test was performed for each peak to test for mean differences between the two groups, 
and this resulted in a list for peaks submitted for identification. 

For the lipid platform, a subset of peaks that exhibited differences between diseased 
patients and control subjects was identified using a reference database as well as targeted 

25 MS/MS methods. In general, upon peak identification, it was found that the levels of certain 
lipid molecules in diseased patients were significantly different from the levels of these lipids in 
control subjects. Interestingly, as seen in the rodent/human comparison study below, many of 
these lipid levels are ^o significantly differmt in diseased rodents compared to control rodents. 
Additionally, a list of human proteins was identified as part of this study using the 

30 "shbtgun" tandem mass spectrometry (MS/MS) method. There was no overlap between the set 
of peaks which were selected during the MS profiling stage, for sequencing by shotgun MS/MS, 
and the set of peaks which exhibited statistically significant level differences between the two 
groups ofhuman samples serum. 



WO 2005/020125 PCTYUS2004/027022 

58 

Results and discussion for the comparison of rodent samples with human samples. 

In this portion of thestudy, the objective was tp compare, the lipid components in the serum from 
c dise^edv^cle^tre^ed 

'Sr- auaaj^ses; '^evdata &m fi^ sei^ U^ 
:pei^ : c»r^ 

Irrthi^ l^a first i^ie qqnb©^^ 

^s;clus^ 

: 10 ^ . " ( Results and discuss 

Clustering and dfo£ifi&^^ species, in 366 

%erifwei^ 

OiOS^nd^using^ Mane^lorato^ 
usedt6 dste^ piusters;in^ 
-15 jlnimanstb:^ 
jtte<^ 
Spe^^ 

^ed)for classification consisted of 366 lipid peaks c^ 
jshpwiL figire reyeds two main grp^, cpnfe 
20 samples::27 o£$t^3 rorift^l^^ 11 of 

the 14 dSeased htmi^i^ all disced rod^ts belo^ Itis concluded from 

this analysis thaiif the di%abs^ it coulddedufced with high 

iacautety%^ 

Forclass^^ 

25 which the 366 rodent lipid measur set ahdrthe corresponding; 

366 human lipid measurements^ Thepetcentage of human samples 

confcc^c^ (39 of the^2 s^ples) 

:as;S^m,Figucb .51. figure 51 shows ^success rate of S^^ luiear classifier as a function , 
of number of ^ In this analysis, thecodont data arevusedform^ andvthe 

30 success rate is the percentage of rodents /correctly classified in a leaverone-out procedure. Also, 
in this analysis, toe human data are^ed as a test set, aad the success rate is the percentage of 
humans correctly cfei^ifi^ bythe indent model. Furfheriinyestigzrtion of the classificatipn and, 



WO 2005/020125 PCT/US2004/027022 

5? 

peak reduction procedures may lead to the confirmation that the diseased rodent model is a good 
model for metabolic disease in humans. 

Results and discussion for the comparison of rodent samples with human samples - 
Common components. A comparison of the 571 LC/MS lipid peaks that were common to both, 
species revealed mat there were significant mean differences in both species between the 
diseased and control groups (at a significance level of 0.05 and using two-Medpairwise /-tests) 
for 195 out of the 571 lipid LC/MS peaks. Of these 195 peaks, 185 exhibited the same trend in 
bom species (higher br lower serum abundance in diseased vs. control). In additioni anumberof 
correlations between pairs of lipid peaks were present both in the human and rodent samples, 

_ using an absolute value of Eearson correlation coefficient greater than 0.7, indicating that not 

on !y we re me abundance differences conserved, but also mat underlying mechanisms involved in 
the regulation of those lipid levels may Kkely be conserved across species. An excerpt of the 
results are summarized in Figure 52. 

More specifically, Figure 52 shows comparison of lipid abundance changes and 
correlations across human and rodent species. In me figure, the large circles consist of elements, 
each of which representing a different LC/MS lipid peak. The shading of the elements 
corresponds to the relative abundance of the lipid in diseased vs. control samples. The relative 
abundances are normalized group mean differences. There are 195 such elements, all 
representing lipids withp<0.05. The outer large circle represents the diseased rodent vs. control 
rodent group comparison, while the inner concentric circle represents the diseased human vs. 
control human group comparison. The lines connecting pairs of elements in the figure are 
correlations, of Pearson coefficient |C U | > 0.70, which are present in both species. 

Summary and conclusions. Metabolite and protein analyses of blinded serum samples 
fix>m animal and human subjects were performed which allowed grouping of the samples based 
on their serum metabolite and protein profiles. Groups identified using clustering analysis 
reflected with 100% accuracy the phenotypic categories of the animal subjects and with a high 
degree of accuracy (>80%) the human subjects. Subsequent analyses identified many ofthe 
molecular components thM differentiate the subjects. 

These independent measures are informative in themselves. Moreover, when linked 
using correlation networks, one begins to see details ofthe biochemical processes that underlie 
the disease or drug response. One of hie more interesting results is that the molecular 
components tlM differentiate the diseased rodents from the control rodents are very similar to 
those that differentiate the diseased humans from the control subjects. The wealth of data 



WO 2005/020125. 



60 



PCT/US2004/027022 



gen^ted!^ 

Nomenclature / Terms Used In This Example 
-Abbreviations arid Terms 
5 -COSA: Glustipng^^^ 

; LC: Li^d C^matb^^Ey 

iO; 'U^yMSfe — —Mass S^ec^oriieftjr - ^-~?,rf . r* ■ /. 

NMR: 

fffcrtein ; $^^ 
mass^^ 

15 • 

In;^ ofan iiutial sujryey scan : 

of peptide^e^ si^Ms'to 
MS/MS scans ibr'eaci^ 
targeted sequeni^ 

Example 6. Systems biology approach: Human cardiovascular disease 
3^^aLM 

car&bv^ciu^ In advMce of the study, thesubject 

samples were;classified into eitWdiseased.dr contrdl c^gori^ (plasn^ 

25 cardioyasciilarMisease and matched, control subjects). SeveM nl^^ 

NMR, LOZMS, and GC/MS teclmdlogies.and data.preprocessmg-sbflware were applied to the 
comparatiye,^dy ot : 80 plasma samples: Tlie metabolomics profiling^ generate 
datasets contaming of ^c^ peaiks that w^e initi^y ndt identified. Instead, peaks of 

statistical significance were determined. These entities were flagged for identifi^on, xismg ; 

30 databases, addition^ MS/MS data, and expert interpretation, in the second>phase of tiie analysis.. 
Univariate and multivariate statistical analyses of the metabplomics datasets revealed measured 
features thaf were significantly different between the two groups of study subjects. Prior to the 



WO 2005/020125 



61 



PCT/US2004/027022 



initiation of the $ecohdphase of project> ftuAer classification ofthe diseased subjects onthe 
basis of a clinical index of disease severity was used aM additional statical analyses were 
performed if any mea^d features correlate wth ^ severity of 1iie cardiovascular disease in 
the diseased group. Ninnerous features showed significance in one or more analysis and was 
5 identified, then, a correlation network was constructed to visualize statistical and biological 
relationships among the idkitified, sigrtocant metaboUtes. 

Objective. The goal of this study was to identify biomarker molecules as molecular 
differences between plasma samples taken fiom cardiovascular disease patients and matched 

control subjects. 
10- ~ -Stady design- Tte stu 

• phase !: metabolomics platforms we employed to comparatively profile 80 plasma 
samples described as being from either male cardiovascular disease patients (40 samples, 
meanuage 53.4 years) or age-matched controls subjects (40 samples, mean age 5 1.6 
years). The analytical platforms were CPMG NMR, ^fEuapn^dited IfMR, GC/MS , 

15 Lipid I^/MS, and Amino acid/global LCMS. Software algorithms were used to extract 

spectral and chromatographic peak infonnation from the raw data. Additional 
preprocessing was preformed to align the peaks among the datasets froin each platform 
^ k e., chromatographic retentiohtime aHgnment fcrlXkand GCfMS) for comparative 
statistical analyses. The peaks remained unidentified until flagged for identification on 

20 the basis of statistical significance. Identification activities were initiated on peaks that 

had different levels of abundance between the two experimental groups. 

• Phase H: Prior to the initiation ofthe second phase ofthe project, further classification of 
the diseased siibjects on the basis ofthe clinical index of disease severity was made and 
additional statistical analyses wereperformed to determine if any measured features 

25 correlated with the severity ofthe disease in the diseased group. Where possible, further 

identification information was obtained for feature A correlation 

network was then cpnstmcted to visualize statistical and biological relationships among 
the identified, significant metabolites. 

Summary of methods. A number of analytical method were used that enable the 
30 comparative profiling of a wide range of metabolites, The samples were analyzed using several 
analytical methods, and statistics were performed on unidentified peaks. Listed and briefly 
described below wete the methods that were used. 



WO 2005/020125 PCT/US2O04/O27022 

62 

(i) CPMGNMR: enhanced NMRfr of low molecular wd^t metabolites 

at concentrations greater than 100 (e<g* amino acids, amino acid metabolites, 
organic acids, sugars), 
(iij GC/MS: global method designed fbrprofilirig,^ metabolites 
5 classes (ifrg., alcohols, fldeKydes and ^ amino acids, acyl amino 

acids, racxinylainino adds, amm^^ 

C6), orgMc taids^^ stig^ acids, sugar amines, 

sugar phosphates), 
(iii) Ogids LG/MS: optiim^dfbrpri^^ 
-10 - : ™ ^ lppphospHcKpids, phospholipids; cM^etpl esj^, (Kacylglycerdls, " r 

triacylgjycerols) 
;(i$ ^ opthr&ed^ 

metabolites. Due to the presence ofcitmte, i^sed ^ a blbqd anticq 
platform^ 

; IS '(V) DiSusipn-^ted.^ ehhaii^ 

metebbljtes; l>e profiled p^s ^ from many lipid 

entities preferied as bid 

Each ofthe above analyses yielded raw dat^tstii&t cont^hurtifre^ of 
20 peaks per sample. In order to enable comparative analysis of metabohte peak information across 
the entire sample sel; several algorithms were applied to each raw data file for peak detection and 
dgnal integration. Next to compensate for nm shS^ in peak position that may occiir in. terms 
of retentiontime for LG/MS and 0^ differences m chemical shift for; 

the NMR techniques, algoritlims were used to "aH the peaks. As a result of this process,,each 
25 metabolite peak within a profile, was assigned a peak identification number (or index number) . 
Tins same identification number was used to describe the analogous peak found in the profiles 
fiom all other samples .and therefore enabled coir^^tive analyses ofthe integrated peak 
'intensities. 

Following univariate and multivariate statistical.analysesof the data from each platform, 
30 metabolites that differentiated the diseased and healthy subjects were listed for identification in 
Phase H as ranked by the applied statistics. 



WO 2005/020125 



63 



PCT/US2004/027022 



Univariate results. Subsequent to data alignment and normalization, univariate 
homoscedastic Wests witk controls for false discovery rates were performed on identified 
metabolite ahalytes from all bioanalytical platforms used in tbe present study. Results showed 
twenty-four analytes wMch have adjusted p-values less man 0.05 based on a 10% false discovery 
5 control using tbe Benjamini-Hochberg approach. 

Multivariate results. Amwtianalyte approach to finding sets of spectral peaks capable 
of categorizing diseased samples and control samples was also pursued. In the Uterature,this 
problem of finding a biomarker composed of more than one molecular component able to 
segregate groups of samples is referred to as a 'classification problem:' In the present Case, only 
-10 - mose analytes wMcbhad uniquely identified were used; mere wetelimety=- 

four such analytes at the time of the analysis. This number does not include isotopes, adducts, 
redundant 1 NMRresonance peaks, and the like, which also may have been identified. The 
challenge of classification, in brief, is to determme amulhanalyteHomarker composed of the 
minimal number of most informative analytes. 
15 inconsidermgbio 

were considered. These mclude determining which subset of analytes is the optimal one to 
include in the marker; how well the final biomarker performs in correctly classifying the sample 
set at hand; and how well the final biomarker performs in correcfly classifying samples from an 
independent sample set In addition to the above items, tire biochemical relevance of the 
20 components constituting the biomarker is also important, as is me feasibility of developing a 
practical diagnostic assay for the final biomarker. With the latter m mind, the min i ma l optimal 
number of analytes which will achieve the best predictive performance criteria was determined. 
Figure 53 depicts the outline of the steps of this analysis. In general, multiple data sets from 
multiple analytical platforms are concatenated, integrated, and correlated, and then are 
25 normalized. This data is further analyzed through a supervised clustering analysis to obtain a 
profile of a biological system. A brief overview of the methodology of constructing a 
muluanalyte biomarker is presented below. 

In order to determine the minimal optimal subset of spectral peaks which best segregate 
disease and control samples, an approach known as Recursive Feature Elimination is used. This 
30 approach proceeds as follows. 

1. Choose a 'classification algorithm' which accepts as input .AT components (i.e., N spectral 
peaks), and returns (i) the success of segregating control and disease samples (as 



WO 2005/020125 



64 



PCT/US2004/027022 



.measured by specificity and sensitivity) , achieved by a, linear combination of the N 
- components, and Qi) a ranking of the Minput components based on their contribution to 
the classificafiqiL 

2. Allow all a^yt^;(aligned, npnnalized i|nd pre-processed) as input to the classification 
5 algorithm., 

3. Witii^se roi^one^ 

combination of input -andytes to'lSe 0&i tb classify control and disease samples. 

4. R^ordthera^^ 
coefficients iii^^ 

.40 - -^^^algc^ 

V^datidn iterations). 
'5;v Ctomjn^ 

r ciiassi^ing control an&disease s^^ mefl^d'(dij5ci^sei 

b^^^as^^^ 
$5"; :.. 6: Remove;lEie analyte.w 

8. Determined higher ^cce^s in 

segregating cqn^ 
•y; cttmbin^im^ 
20 Mffespon^g to e^ 

The term 'Reciirsive Featu^ reflects ^e^successive pruning of the list of spectral 

peaks by one spectral .peak for each iteration of Steps 3 through 6. 

ffiihe present sM This algorithm involves ia 

state-of-the-art approach refefredto as a 'Logistic -Classifier' (Anderson, 1982). This method 
25 has its origins in h^d\rating an^ It is designed to select for a. 

final bibmarker comprising comp^^ 

redundancy and minimize bibmarker size. Whilethe general-principles of the techniquerare 
known, the current analysis optimizes it to work with data derived from the particular 
bioansiyticd profiling pi^prms discussed earlier, 
30 There are two different tests of performance which have been applied for the processes 

outlined in this section. 

1. 'Cross-Validation P^om^ce' is the clas^catibn of a bibmarker wbichhas 

been cohstructedbased on a subset of the available samples, and tested oh the remaining 



WO 2005/020125 



65 



PCTAJS2004/027022 



samples which have been a /^/^intentionally left out (Hastie, 2001). Atypical 
situation for the present study is to construct a biomarker based only on thirty-two (34) 
diseased samples and thirty--two (34) control samples chosen at random, and to testthe 
performance (classification success) of the resultant biomarker in classifying the 
5 remaining six (6) diseased and six (6). control samples which were excluded. This 

process is repeated successively many times, with different sets of randomly chosen 6+6 
samples Heft out'. The reported ^oss-Validation Perfonhance 5 for the biomarker is the 
averaged performance of many such permutations; typically ten cross-validation rounds 
-are used; 

10 " ~ It isMjpojtmt to nbt^lhat^^urpose of Qross-Va^dationis to assess the 

geheralizabmtyofabiomarker, within the limitations posed by the availability ofa 
relatively lbntedra samples. In the absence of independent 

sample fi^ of patients, the Cross-Validation Performance is an 

estimation of the performance of the biomarker on an independent test set of samples. 

15 Such an extrapolation is made possible by measuring the performance of the biomarker 

on the many permutations and combinations of subsets of the available samples; this 
process effectively simulates a situation in which many more samples are available. 
2. 'Permutation Performance' is the performance of the multivariate biomarker selection 
algorithm when sample labels have been randomly permuted. This occurs over may such, 

20 random permutations, and the average performance is reporte(L A robust classifier— one 

which is not overfit to the training set— should yield a permutation performance of 
approximately 50% (i.e,, chance performance). 

Results and discussion. The results of these classification methods are graphically 
shown in Figure 54. A biomarker set of fifteen molecular components was identified as part of a 

25 profile the human cardiovascular disease. These molecular components of the biomarker set 
were discovered by using multivariate statistical analysis methods and integration of a plurality 
of datasets including those for more than one type of measurement technique and those for more 
than one biomolecular component type as shown in Figure 56. This methodological. approach 
was used successfully to generate a biomarker set which coxild classify the 80 samples. Figure 

30 55 shows the classification of each subject as a disease or control group member using these 
biomarkers. A sensitivity of 93% and a specificity of 94% were obtained. 



WO 2005/020125 



66 



PCT/US2004/027022 



The abbreviations used in this example are, where Appropriate, the same as those used in 
Example 5. 

Each of the patent docu&ents and scientific publications disclosed hereinabove is 
incorporated by reference herein for all purposes; 

5 iWthduputhe invent been particularly shown and described with reference to 

Specific emhbdimmts, it should be undei^ood by ihose skilled in the art that various changes in 

or se($e oft^ t^fbregomg e^^ 

Respects ilhistr^ The.scope of the 

10; invenfionis thus^'M 
aUchanges^^^ 
intepd^to 



WO 2005/020125 



67 



PCT/US2004/O27022 



What is claimed is: 

1 L A method of profiling a state of a biological system in a mammal, the method 

2 comprising the steps of: 

3 (a) evaluating -with statistical analysis a plurality of data sets of a "biological system 

4 and comparing features among the plurality of data sets to determine one or more sets of 

5 diflferences among at least a portion of the plurality of data sets; and, 

6 (b) developing a profile forastate of the biological system based on the results of 

7 step'(a), 

8 wherein the plurality of data sets comprise n^ from more than o ne 
kolpgFcal sample type, more than one type of me^ than one 

10 biomolecular component type, or a of a biological sample type, a 

1 1 measurement technique, and a biomolecular component type. 

1 2, The method of claim 1 wherein the biological system is in a human, 

1 3, The method'of claim 1 wherein the statistical analysis comprises multivariate 

2: analysis. 

1 4, The method of claim 1 wherein the biological sample type is selected from the 



2 group comprising blood, plasma, serum, cerebrospinal fluid, bile, saliva, synovial fluid, pleural 

3 fluid, pericardial fluid, peritoneal fluid, sweat, feces, nasal fluid, ocular fluid, intracellular fluid, 

4 intercellular fluid, lymph, urine, liver cells, epithelial cells, endothelial cells, kidney cells, 

5 prostate cells, blood cells, lung cells, brain cells, skin cells, adipose cells, tumor cells, and 



6 mammary cells. 

1 5. The method of claim 1 wherein a plurality of data sets are derived from one 

2 biological sample: type that is treated differently, or from one biological sample type that is 

3 collected or analyzed at different times. 

1 6. The method of claim 1 wherein the measurement technique is selected from the 

2 group comprising liquid chromatography, gas chromatography, high performance liquid 

3 chromatography, capillary electrophoresis, mass spectrometry, liquid chromatography-mass 

4 spectrometry, gas chromatography-mass spectrometry, high performance liquid chromatography- 



WO 2005/020125 



68 



PCT/US2004/027022 



- 5 jmass; spectrometry, capillary electrophoresis^i^isis ^ctrometry, nucle^ra^ 

6 ; spirometry, parcel hybn^ 

j, H\ "piemethodtf^ 

2 irom d^ 

1 8* The meliidd of claim 1 ^erdn^e bidmoleciilar componenttype is a gene, a 

2- . g^e trMisc^^ 

L ' 5; v - Themethod^f^ 

i 

•2 f ^ 

;t $1. ^^cle of manufe 

2 v fcam^^ 

1 ^ 12§: Amethbio^^ 

2 comprising the steps of : 

fB.;'; (a)J evaluating ^ data siefts for a biomolecuiaf . 

4 component^type and con^ar^ 

;5j more^setsofi^ 

JJ>) eyaluathig^ 

7 compira^it^ 

: $T more setsof ;diff^ 

9 (6) correlating theresiilts of step (a) Md ^ep.fb) to develop a profile for a state of the 

TO; ' biological system, 

1 13; ITjfiMet3iodofclaimI2^ 

2 wmponent type :dr another hiom^ comprise measurraients derived from: 

3 more th^^dne;hidiopcal sample.t^ 

4 combination of a biological sample type and ameasuremenf tedmijjue. 



1; 14; The method of claim 12 wherein the biomolecular component type is a protein 

2 andthebfterbioiM^ 



WO 2005/020125 



69 



PCT/US2004/027022 



I 15, The method of claim 12 wherein the biomolecular component type is a gene 

1 16. A method of profiling a state of abiological system in a mammal, the method 

2 comprising the steps of: 

3 (a) evaluating with statistical analysis a plurality of data sets comprising 

4 measurements froth at least two biomolecular component types and comparing features among 

5 the plurality of data sets to determine one or more sets of differences among at least a portion of 

6 the plurality of data sets; and 

7 (b) developing a profile for a state of the biological system basedjm the results of 

8 step (a). 

1 17. The method of claim 16 wherein the plurality of data sets comprise measurements 

2 derived from more than one biological sample type, more than One type of measurement 

3 technique, or a combination of abiological sample ^e and a measurement techm 

1 18. The method of claim 1 6 wherein the step of evaluating comprises: 

2 evaluating a plurality of data sets for a biomolecular component type and comparing 

3 features among the plurality of data sets to determine one or more sets of differences among at 

4 least a portion of the plurality of data sets; and 

5 evaluating a plurality of data sets for another biomolecular component type and 

6 comparing features among the plurality of data sets to determine one or more sets of differences 

7 among at least a portion of the plurality of data sets. 

1 19. The method of claim 16 wherein the at least two biomolecular component types 

2 comprise a protein and a metabolite. 

1 20. The method of claim 16 wherein the at least two biomolecular component types 

2 comprise a gene transcript and a metabolite. 



WO 2005/020125 



PCTYUS2004/027022 



2/60 



5> 




CD 
O 


c 
o 


rks 




1 




"O 
2 


rrel 


I 

2 


0- 


Go 



0 

X 



-too 



£ 

CO 

CO 

i 



o 

8 





at 




w 


to 

.4— ' 


« 


CO 


S 


Q 


s 








£ 







CM 



CD 

M CO 
CO 0> ^25 

2- O 

O 



CD 
CO 

iS 

Q 

1 

z: 



•JS CO 

§1 

o\jE 

O CD 

T5 

2 

0- 



co 



o j2 

.£2 c5 
c 

o-# O 
O CL 

S E 

- o 
O 



i2 



2 



€0 

I 



CM 

tu 

CO 

I 

Q 



to 

*- 
III 

1 



CO 

i 

Q 



« « 



\VO 2005/020125 



3/60 



PCT/US2004/Q27022 




Hybridization, 



O WildSype 
APOE* ' 



Samples 

2:VVT-15 
3: VVT^t© 
4;WT-20 
5: APGE*3r2 
6: APOE*3^3 
7:^PQE^3-4 
8: APOE^3-5 



Figures 



WO 2005/020125 



4/60 



PCT/US2004/027022 



1 
3 




2 

S5 



CO ID 1 

(01 Boj) anjBA-d 



WO 2005/020125 



5/60 



PCT/US2004/027022 



S3 

I 



fa 

... 3 



■ est 

to a< 

O 

IT) 



I 

a. 



I: 

I 

a. 



CD 



i r 



1 




CM 



lO 

6 



WO 2005/020125 



6/60 



PCIYUS2004/027022 



0 



ft 



A > 

? 1 

R S 
OQ .2 

I f 

III 



I 







u 


< 








o 


m 


PQ 


Q 


pq ; 


m 


< 


< 


O 


<!- 


< 


PQ 


PQ 




PQ 


< 


< 


Q 


< 


<! 


PQ 


PQ 


U 



o 



3 



m 
o 

2 



o 
o 

*4 



§ 



o 
o 

hi 



WD2M5/020125 PCT/US2004/027022 

7/60 



f 



© 

t 

o 

*CS f 

s 

© 

es 
© 



'r© 



OS s3 
r « cm 
O 









: .:•>: k I.?' :■ :• :'. 


' *T ^ V 








■ C»t ••*••«■ 




















: : - • 



id. 



IN 



s 

i 



3 



.... ...!KS^. 

cnCDCn a d .n poobrn 



WO 2005/020125 



8/60 



PGT/US2004/027022 



> 
u 

© 

•A 
ft 

f 

•a 

1 

ft 

© 

sa 
sa 

GO 

Si 
U 

1/5 



1 

TO 

a 













* 



a 



in 



5 



S S B Si 5 

c5 d a o o 
umssaidxg fiorj uoijeiMQ piepueis 



<3J 
wa 
03 

s 

-a 
H 

CO 

o 

S3 



CO 
Q 



to 
o 



13 



U3 

53 T- 



en 



e is 

ex. ja 

w ca 

vn Ul -g 

o> E 

5 ± 

E 

o s .2 



| | I j ! 


— "» — ryf * » r 














i t 1 1 n ,;. 





0) 



a s b s s ° 

o o q a d 



^eneb d o d 
Ajiljqeqoy 



WO 2005/020125 . PCTAJS2004/027022 

9/60 ■ 




WO 2005/020125 



10/60 



PCT/US2004/027022 




(0|,6o0enjBAHJ 



WO 2005/020 125 



PCT/US2004/027022 



11/60 




3 



© 



1 



t 



o 

1 - 

S s 

q g 

*M o 



CD 

o 



1 



if 

II 



i 

fl 

e 

w 

s 



m 



CM 



i 



S id 

.rr* 3 s© S ® 



^1: 



fills* 



s 



8.3' 
I # 



Si 



2 

S- 
5 




WO 2005/020125 



12/6Q 



PCT/US2004/027022 











fit 


f 








i3 


S m ' 


CO 



§ 

5 







1 

o 


ci 


1 


conditic 


CO 





s § if 

|il.f 
It If 



7 




WO 2005/020125 



13/60 



PCT/OS2004/027022 




« 4 



WO 2005/020125 PGT/US2004/027022 

14/60 




WO2005/020125: 



15/60 



PCT/US2004/027022 



Gene Expression Analysis 



2. J?. 0-3 

2.40^ 

£ 1 .SOd 

^ 1 .40-3 
^ 1 




lis:?- 



> Mouse Uver ^tissue 1 
mRMA hybridization 



? File Manrje 



WHdtype APOE3 WHdftme 

y***m ^m^ mi^ 



Figure 15 



WO 2005/020125 



16/60 



PCT/US2004/027022 




• Mouse liver tissue 
■ mRN A hybridization 



Figure 16 



WO 2005/020125 



17/60 



PCT/US20Q4/627622 




jBtgurel? 



WO 2005/020125 



PCT/US2004/027022 



18/60 




Mouse plasma 
Protein (peptides) 
ESMon Trap 
IMPRESS™ algorithm 



Compilation of 10 LC-MS datasets 



Figure 18 



WO 2005/020125 PCT/US2004/027022 

19/60 




• Mouse Plasma 

• Protein (peptides) 

• ESl-Ion Trap 
•IMPRESS™ algorithm 



Time.{min) 



Figure 19 



WO 2005/020125 



20/60 



PCT/US2004/027022 




Figure 20 



WO 2005/020125 PCT/US2004/027022 

21/60 




WO 2005/020125 



PCT/US2004/027022 



1 7.5E*0B 




time (mln) 



Figure 22 



WO 2005/020.125 



23/60 



PCT/US2004/027022 




Variable; Index 



c23 



WO 2005/020125 



24/60 



PCT/US2004/027022 



D2 


PCDA Score Plot 


0.6- 






ApoBZrZ 


0.4* 


mm 






0.2- 
0- 






/ ApoE3-7 


-0,2- 






ApoE3-3 / 


-0*4- 




WF-15 





0*4 -0;3 ^1 jj" o.i oi 0?3 



• Mouse Plasma 

• Protein (peptides) 

• ESI-lon Trap 

• IMPRESS™ algorithm 



Figure 24 



WO 2005/020125 .PCT/US2Q04/027022 

25/60 



® Identify 
using 
IMS/MS 




• Mouse Plasma, 

• Protein ^ pepfities) 
r ES Won Trap 

• IMRRESSl^ algontoni 
•Pattern reppghitipiV 



Fi£iire25 



WO 2005/020125 



26/60 



PCT/US2004/027022 



b* bg; bio ba biz bla bi4 bis bie bj7 

W V Q T IS i^O^ R 

yib yis y« viz yn yio ye ya y? ye ys 



ApoE3-7 #768-783 RT: 27.60-28.01 AV: 6 NL 1.16E5 
F: +C ESI Fufl ms2 1368.00@37;00 [37&00-20p£>.00] 



100-= 

osl 

80| 

8S| 

80| 

75| 

70| 

65l 

6o| 

55] 

5of 

4S| 

40^ 

30-z 
25-E 
20t 

io| 

5-5 
0 



ys 

1047.5 



~m/z1366 
+2lon 



hApoES 



ye 

8604 



y$ 

646.4 



400 600 800 



Y7 
873.4 

ye 

745.5 b7 
826.3 



1000 



APCSJPOflWIBW B WWC01ISQK lAfO^SJ-gi} 71755 )plx1 IIFHUS apolipepxotvia B~p*«ox«w 
- hu~n-iji.U7tl51lgM.JUHS35At.il (X00»«) pr**polipoprot«lD MBovo J»pl. 
-flIf4i0570tl9hpUD02SOS.il (AFOSOlM) apOUpopEetein 6 [tumo Mpleul 
1HW3C154) „ , . 

Hwawwow traccoww wwrastB* xxawnwos GQjwEuocm iron* w» unvcnu 
ffsovrgoax i*n«Hic2Lr*Xiui6ttB0t tpvxectjur i^keu>xxo\ ra^Aa^c ^vqihvsev 

WCLGQSTtB UV|OXS«lK KlMtiaiWlX V&tQXRUVY 'QBUNUHEUtt aiMMWX rLYtQSHVRA 

>TW*T^*mP tUCWlOMTCS MBfrBHKIHg ffnTBPFX. fl 

SUmXVECK QftQHXCLVZX VQ&AV6TJEM msn« 
>*o*oUotcplc MIQ3 - 

l«ax szasx ilnij 

RTOn > 5I i g7gBa a 5SQYTflT . t a.' 



bi4 

1683.6 



yio 

11603. 



fa 9 

021.3 



Y11 

1273.6 

b 10 



yi2 

bj^1402.6 



1200 
mfz 



1199.5 1328.6 t) 12 [ 
1200 1400 1600 



bis 

1771.3 516 b17 
V15| 1857.9 1885.5 



1B57.C 



t* 

1800 



2000 



Figure 26 



WO 2005/020125 



27/60 



PCT/US2004/027022 




CL| 



WO 2005/020125 



28/60 



PCT/US2004/027022 




WO 2005/020125 



29/60 



PCT/US2004/027622 





Figure 30A 



Figure30B 



WO 2005/020125 



PCT/US2004/027022; 




Rgure31A 




K g ure31B KgureSiC 



WO 2005/020125 PCI7OS2004/027022 

32/60 




Figure 32A 



Figure 32B 



WO 2005/020125 



PCT/US2004/027022 



33/60 



Figure 33 A 




L-FABP PPAfce APO\M APOA4I mAPOE 



Figiire33B 





L-FABP . hApoE*3 ApoA-1 




LyabPC PC PC TG TG 

C225 C36:0 C44;4 CSBrtO C58:9 



WO 2005/020125 



34/60 



PCT/US2004/027022 




Figure 34 



WO 2005/020125 PCT/US20D4/027022 

35/60 



WO 2005/020125 



36/60 



PCT/US2004/027022 




5.0 4.0 3.0 2.0 1.0 

Figure 36 



WO 2005/020125 



37/60 



PCT/US2004/027022 




WO 2005/010125 PCTYUS20O4/027O22 

38/60 



Difference Factor Spectrum 




Component 



Figure 38 



WO.2005/G20125 



39/60 



PCT/US2004/O27022 



i SWT plasma metabolic LG-MS profile 



c 




Sc^hriumber 

TG piasma metabolic 
■sLjap - Original TIC 

■'m—m'' - impress filtered ^ (RJC) 



Mi 




Scan number 



Figure 39 



WO 2005/020125 



40/60 



PCT/US2004/027022 




Figure 40 



WO 2005/020125 - PCT/US2004/027022 

41/60 



PCpA score pldt Pi vs. D2 




•0,6: ' ;■ 

t- : — -t— — — 1 — — t- — i 1 1-^ — i 1- 

*9-+ -o-3 r0.2 -0.1 0 < 0.1 0.2 0^3 04 

Pi 



Figure 41 



WO 2005/020125 



42/60 



PCT/US2004/027022 



Difference Factor SpQctrum 

m— ~ : — " — 1 — 




Component 



Fig0re42 



WO 2005/020125 PCT/US2004/027022 

43/60 




tformialfiiitioh 



Pattern 
Rocpgnitlayir 



Pattern 
Signatures. 



Statfstfca!'. 



pispfimlnaUngi 
Components* 




, Component^ 
IdortUflcatiom 




WO 2005/020125 



44/60 



PCT/US2004/027022 



1 3 ~ 

c 2 Jg 
o o> Q 



AS <D 

15 m iS 

I go 

Z o 
(3 



T3 

go 

3- 




o 



o 
z 



o 

z 



d 
z 



wb 2005/620125, 



45/60 



PCT/US2004/027022 



o 




Figure 44A 



WO 2005/020125; 



46/60 



PCT/US2004/027022 




Figure 44B 



WO 2005/020 1 25 



47/60 



PCT/US2004/027022 



<5 £ I 

;o i> E 
e > ~ 

ro c o. 

» § g 

1 1.1 

§ §1 

o a. 



(0 



(0 

□< « o 

XI 



•a: . 





O) 

c 

111 

(5GQ. 

W -» T> 

a 



s 6 

2 s i £ 

0) a. z o 

co o z - 





0£ B> 


E 


S.E 


3 


2 £ 


a> 

CO 


iti 8 
a a. 



WO 2005/020125 PCT/US2004/027022 

48/60 



100.0 



"10.0 




<c- CM CO 

"a, 3 tj 

^ 2 S S 

OL CL D- -I 



Figure 45A 



WO 2005/020125 



49/60 



PCT/DS2004/027022 



Protein 2 



— ) . i 



Metabolite 3 / 

Lipid 5 

MctaboIlip3. 

Metabolite 3 \ 

MetaboDle3 'A 

Metabolfle'a.f 

MkabdUtel\ 
MctabcJHe 1 1 
" MetabcffiolS 



Metabolite 2 ^ 
Metabolite 2 

Protein 3 




0 Upfds(LCrtS) 
$ NMR.(DE) 

NMR (GPMG) 
□ Peptides 
Q Clinical 



■MM = positive correlation 
: = negative correlation 

# = hfgherln treated group 

# * lower In treated group 



Figure 46 



WO 2005/020125 



50760 



PCT/US2004/027022 



Clinical 




Figure 47 



WO 2005/020125 PCT/US2004/027022 

51/60 



10.0 




Protein A 
(Protein 1) 



Protein B Protein C: 



Protein D . 
(Protein;2) 



Figure 48 



WO 2005/020125 PCT/US2004/027022 

52/60 




WO 2005/020125 



53/60 



PCT/US2004/027022 





N as 
co fl) » 



5 c 
o 
O 







CD 








XO 


CO 


IS 


§ 


■a 


O 




C 





« S £ O) I £ ffl .g 

1 ^^1 Si § IS 

VJ ^Z> r\ rrf ^* V— <r- t — 



~ CO 

P 



„ ,. y - 3 ^ :§ 



3 



i 

o 
c 



CO 
CO 

D) 



2 

Q. 



0) 
N 

"CO S 



CO 





WO 2005/020125 PCT/US2004/027022 

54/60 




Bw rob 



WO 2005/020125 



55/60 



PCT/US2004/027022* 




0 l J i- 1 X — L j l i i_ 

366 256 128 64 32 v 16 8 4 2 

Number of Lipid LC-MS Peaks 



KgiireSl 



WO 2005/020125: PCT/US2004/027022 

56/60 



ci&ilij 

C20:4 lipid 
C203fipid 
C1&1 Bpld 4 ^ 



C1&0 lipid 



C34:2tfpM 
C44: 
C34i2 

C3asiipld • 

C48*2 $pid 
C4fc1 lipid - 

C5fc5!iptd — -] 
C50r4npld - — 
C503 lipid ~ 

C502lipJd - 
C50;1 OpW 



CSI&fipjd 
C52S lipid 
C52S 
CS2#8ptt 
C522 




lipid 
C58:4HpW 
C522 lipid 



inner circles diseased humans vs. control humans 
outer circle - diseased rodents vs. control rodents 



® lipid lower in disease 
0 lipid higher in disease 
mmm negative correlation 
wmm positive correlation 



Figure 52 



WO2005/020l25 



57/60 



PCT/US2064/027O22 






I 



| O 12- 5 

«2 S S 
jo °- z P 



CO 

E 

8 D O. 
51 J 2 

CL 



CO 

E 

CO 



to £ 
s = 



co O ^ 

5" o I 



WO 2005/020125 



58/60 



PCT/US2004/027022 




CO 
CO 



WO 2005/020125 



60/60 



PCT/US2004/027022 



Fifteen Biomarker Analytes 



Analvte 


Weiaht in Biomarker (arb. units} 


Platform 


Lipid 1" ' 


0.42 


Lipid LC-MS 


Lipid 2 


0.33 


Lipid LC-MS 


Metabolite 1 


0.31 


GC-MS . 


Metabolite 2 


0.30 


NMR 


Metabolites 


Q.30 


GC-MS 


Metabolite 4 


0.25 


GC-MS 


Lipid 3 


6.24 


Lipid LC-MS 


Metabolites 


0,23 


GC-MS 


Lipid 4 


0.21 


Lipid LC-MS 


Metabolite 6 


. 0.20 


GC-MS 


Metabolite? 


0.18 


NMR 


Lipids 


0.18 


Lipid LC-MS 


Lipid 6 


0.17 


Lipid LC-MS 


Lipid 7 


0.04 


Lipid LC-MS 


Lipid 8 


0.01 


Lipid LC-MS 



Figure 56 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: . 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



THIS PAGE BLANK (usptoj 



